Capability Scoring Server And Related Methods For Interactive Music Systems

ABSTRACT

Capability scoring server systems and related methods are disclosed for interactive music systems. In certain embodiments, an interactive music server system communicates network packets with interactive music client systems associated with one or more interactive music sessions and determines operational parameters associated with the interactive music client systems. The operational parameters can include, for example, latency score results for network latency testing for the interactive music client systems. The latency score results can be used for a variety of purposes including predicting latency score results, controlling access to interactive music sessions, filtering lists of interactive music client systems, and/or other purposes. The operational parameters can also include, for example, internal capabilities information for the interactive music client systems. The internal capabilities information can also be used for a variety of purposes including controlling access to interactive music sessions and/or other purposes. Other variations can also be implemented.

RELATED APPLICATIONS

This application claims priority to the following co-pending provisionalapplication: U.S. Provisional Patent Application Ser. No. 61/950,377,filed Mar. 10, 2014, and entitled “SYSTEMS AND METHODS FOR INTERACTIVEMUSIC,” which is hereby incorporated by reference in its entirety.

This application is also related in subject matter to the followingconcurrently filed applications: U.S. patent application Ser. No.______, entitled “DISTRIBUTED RECORDING SERVER AND RELATED METHODS FORINTERACTIVE MUSIC SYSTEMS;” U.S. patent application Ser. No. ______,entitled “DISTRIBUTED METRONOME FOR INTERACTIVE MUSIC SYSTEMS;” U.S.patent application Ser. No. ______, entitled “PACKET RATE CONTROL ANDRELATED SYSTEMS FOR INTERACTIVE MUSIC SYSTEMS;” U.S. patent applicationSer. No. ______, entitled “TRACK BASED MUSIC MANAGEMENT SERVER ANDRELATED METHODS FOR INTERACTIVE MUSIC SYSTEMS;” and U.S. patentapplication Ser. No. ______, entitled “NETWORK CONNECTION SERVERS ANDRELATED METHODS FOR INTERACTIVE MUSIC SYSTEMS;” each of which is herebyincorporated by reference in its entirety.

TECHNICAL FIELD

The disclosed embodiments relate to network-based systems for musicsessions and associated audio transmissions among network connectedsystems.

BACKGROUND

Musicians often collaborate in music sessions where each musician ispresent within a recording studio and a session recording is made.Musicians also collaborate to create session recordings where sub-groupsof musicians separately record their portion or tracks of the musicrecording at the recording studio, and the studio then combines therecordings for form a master recording. Musicians also collaborate inmusic sessions in less formal environments, such as home studios andgarages. With the growth of network connected systems, efforts have beenmade to provide collaborative music sessions through network connectionsand the internet. However, these efforts suffer from latency and othernetwork connectivity issues that degrade the experience of the users toan extent that interactive collaboration or a group session cannoteffectively be achieved.

SUMMARY

Capability scoring server systems and related methods are disclosed forinteractive music systems. In certain embodiments, an interactive musicserver system communicates network packets with interactive music clientsystems associated with one or more interactive music sessions andfurther communicates with the interactive music client systems todetermine operational parameters associated with the interactive musicclient systems. The operational parameters can include, for example,latency score results for network latency testing for the interactivemusic client systems. The latency score results can be used for avariety of purposes including predicting latency score results fordifferent interactive music client systems, controlling access tointeractive music sessions, filtering lists of interactive music clientsystems, and/or other purposes. The operational parameters can alsoinclude, for example, internal capabilities information for theinteractive music client systems. The internal capabilities informationcan also be used for a variety of purposes including controlling accessto interactive music sessions and/or other purposes. Different featuresand variations can be implemented, as desired, and related systems andmethods can be utilized, as well.

For one embodiment, an interactive music server system is disclosed thatincludes a network interface and one or more processing devicesconfigured to communicate network packets through the network interfacewith interactive music client systems associated with one or moreinteractive music sessions and to communicate with the interactive musicclient systems to determine operational parameters associated with theinteractive music client systems.

In further embodiments, the one or more processing devices are furtherconfigured to instruct the interactive music client systems to performlatency tests with other interactive music client systems to generatelatency score results as the operational parameters, and the one or moreprocessing devices are further configured to receive the latency scoreresults from the interactive music client systems and to store thelatency score results in one or more data storage systems.

In additional embodiments, the one or more processing devices arefurther configured to use the stored latency score results for one ormore interactive music client systems to predict latency score resultsfor one or more different interactive music client systems. In furtherembodiments, the one or more processing devices are further configuredto predict latency score results based upon a shared internet serviceprovider and a shared local geographic area between interactive musicclient systems. Still further, the shared local geographic area can beat least one of a shared city or a shared zip code. In otherembodiments, the latency score results for the latency tests include oneor more of the following: network latency including time for an audiopacket to travel within a network between interactive music clientsystems, a transmit latency including time for an interactive musicclient system to generate a transmit audio packet, a receive latencyincluding time for an interactive music client system to process areceive audio packet, or a latency associated with communicationsthrough a proxy server.

In still further embodiments, the one or more processing devices arefurther configured to limit latency tests based upon one or morefilters. In addition, the one or more filters can include at least oneof the following: a distance filter associated with geographic distancebetween interactive music client systems, a frequency filter associatedwith a rate of latency tests, or existence of stored latency datasuitable for predictive purposes between a pair of interactive musicclients.

In yet further embodiments, the one or more processing devices arefurther configured to receive an access request from an interactivemusic client system to join an interactive music session and to uselatency score results associated with the requesting interactive musicclient system to allow or disallow the access request. In additionalembodiments, the one or more processing devices are further configuredto allow an interactive music client system associated with theinteractive music session to control approval or disapproval of theaccess request based upon latency score results. Still further, theinteractive music session can be a currently active session or a futurescheduled session. In still further embodiments, the one or moreprocessing devices are further configured to allow an interactive musicclient system to search, filter, or order a displayed list of otherinteractive music client systems based upon latency score results.

In additional embodiments, the one or more processing devices arefurther configured to communicate with the interactive music clientsystems to determine internal capabilities information for theinteractive music client systems as the operational parameters, and theone or more processing devices are further configured to receive theinternal capabilities information from the interactive music clientsystems and to store the internal capabilities information in one ormore data storage systems. In addition, the internal capabilitiesinformation can include one or more of the following: concurrent decodecapabilities, packet processing rate capabilities, audio processingcapabilities, video processing capabilities, or network bandwidthcapabilities. In further embodiments, the one or more processing devicesare further configured to receive an access request from an interactivemusic client system to join an interactive music session and to useinternal capabilities information associated with the requestinginteractive music client system to allow or disallow the access request.In still further embodiments, the one or more processing devices arefurther configured to instruct interactive music client systems withinan interactive music session to apply packet rate throttling based uponinternal capabilities information for at least one of the interactivemusic client systems within the interactive music session.

For another embodiment, a method to determine operational parameters foran interactive music system is disclosed that includes communicatingnetwork packets with interactive music client systems associated withone or more interactive music sessions and communicate with theinteractive music client systems to determine operational parametersassociated with the interactive music client systems.

In further embodiments, the method further includes instructing theinteractive music client systems to perform latency tests with otherinteractive music client systems to generate latency score results asthe operational parameters, receiving the latency score results from theinteractive music client systems, and storing the latency score resultsin one or more data storage systems.

In additional embodiments, the method further includes using the storedlatency score results for one or more interactive music client systemsto predict latency score results for one or more different interactivemusic client systems. In further embodiments, the method also includespredicting latency score results based upon a shared internet serviceprovider and a shared local geographic area between interactive musicclient systems. Still further, the shared local geographic area can beat least one of a shared city or a shared zip code. In otherembodiments, the latency score results for the latency tests include oneor more of the following: network latency including time for an audiopacket to travel within a network between interactive music clientsystems, a transmit latency including time for an interactive musicclient system to generate a transmit audio packet, a receive latencyincluding time for an interactive music client system to process areceive audio packet, or a latency associated with communicationsthrough a proxy server.

In still further embodiments, the method also includes limiting latencytests based upon one or more filters. In addition, the one or morefilters can include at least one of the following: a distance filterassociated with geographic distance between interactive music clientsystems, a frequency filter associated with a rate of latency tests, orexistence of stored latency data suitable for predictive purposesbetween a pair of interactive music clients.

In yet further embodiments, the method also includes receiving an accessrequest from an interactive music client system to join an interactivemusic session and using latency score results associated with therequesting interactive music client system to allow or disallow theaccess request. In additional embodiments, the method further includesallowing an interactive music client system associated with theinteractive music session to control approval or disapproval of theaccess request based upon latency score results. Still further, theinteractive music session can be a currently active session or a futurescheduled session. In still further embodiments, the method alsoincludes allowing an interactive music client system to search, filter,or order a displayed list of other interactive music client systemsbased upon latency score results.

In additional embodiments, the method also includes communicating withthe interactive music client systems to determine internal capabilitiesinformation for the interactive music client systems as the operationalparameters, receiving the internal capabilities information from theinteractive music client systems, and storing the internal capabilitiesinformation in one or more data storage systems. In addition, theinternal capabilities information can include one or more of thefollowing: concurrent decode capabilities, packet processing ratecapabilities, audio processing capabilities, video processingcapabilities, or network bandwidth capabilities. In further embodiments,the method also includes receiving an access request from an interactivemusic client system to join an interactive music session and usinginternal capabilities information associated with the requestinginteractive music client system to allow or disallow the access request.In still further embodiments, the method also includes instructinginteractive music client systems within an interactive music session toapply packet rate throttling based upon internal capabilitiesinformation for at least one of the interactive music client systemswithin the interactive music session.

Network-based distributed interactive music systems and related methodsare also disclosed. The disclosed embodiments achieve reduced networklatency and other advantageous features that provide a positive userexperience for music sessions using a network-based distributedinteractive music system. In part, the disclosed embodiments providereal-time platforms and related methods for interactive andcollaborative music performance and production. The interactive musicsystems allow individuals at different physical locations that are assimple as different rooms in one location to locations potentiallyhundreds miles apart, in real-time to play, produce and share music bydoing so across the internet, local area network, and/or other networkconnections. The disclosed systems and methods further provide a numberof different components that can be used individually or in combinationto provide the disclosed aspects and features for the interactive musicsystems and methods described herein. Different features and variationscan be implemented, as desired, and related systems and methods can beutilized, as well.

For one additional embodiment, an interactive music client system isdisclosed that includes an audio capture subsystem coupled to one ormore audio inputs and to output captured audio data, one or moreprocessing devices coupled to receive the captured audio data and toprocess the captured audio data to generate audio output packetsincluding audio output data associated with one or more interactivemusic sessions, and a network interface coupled to receive the audiooutput packets and to send the audio output packets to one or more peerinteractive music client systems through a network.

In further embodiments, the interactive music client system furtherincludes one or more storage systems coupled to the one or moreprocessing devices to store data associated with one or more interactivemusic sessions. In additional embodiments, the network interface isfurther coupled to receive audio input packets containing audio inputdata from one or more peer interactive music client systems through anetwork, and the one or more processing devices are further coupled toreceive the audio input packets and to process the audio input packetsto generate audio input data. In other embodiments, the interactivemusic client system further includes an audio output subsystem to outputaudio output signals associated with the audio input data. In stillfurther embodiments, the one or more processing devices are furtherconfigured to perform at least one of following: to communicate with oneor more server systems and one or more peer interactive music clientsystems to determine a session link score for the interactive musicclient system, to register with one or more server systems for a musicsession, to record one or more tracks associated with a music session,to adjust an input packet rate or an output packet rate for audiopackets, to store input audio frames in a jitter buffer and discard oneor more frames based upon periodic time windows, to send one or moremusic cues to one or more other interactive music client systems withina music session, to adjust audio processing based upon virtual locationplacement within a music session, to communicate with one or more otherinteractive music client systems within a music session to provide adistributed metronome, or to provide an output queue for one or moreother interactive music client systems within a music session and adjusta rate for the audio output data for each output queue.

For one further embodiment, an interactive music server system isdisclosed that includes a network interface coupled to receive networkpackets through a network from one or more interactive music clientsystems associated with one or more interactive music sessions and oneor more processing devices coupled to receive the network packets, toprocess the network packets, and to output network packets to theinteractive music client systems through the network using the networkinterface.

In additional embodiments, the interactive music server system includesone or more storage systems coupled to the one or more processingdevices to store data associated with one or more interactive musicsessions. In still further embodiments, the one or more processingdevices are further configured to perform at least one of the following:to communicate with interactive music client systems to determinesession link scores for the interactive music client systems, toregister interactive music client systems for music sessions, to providea registry for music sessions or interactive music client systems orboth, to receive and store recorded tracks associated with a musicsession and allow these recorded tracks to be downloaded to interactivemusic client systems participating in the music session, to stream livebroadcasts for music sessions, or to provide access to and download ofpreviously recorded music sessions including different recorded trackswithin the recorded music sessions.

Different or additional features, variations, and embodiments can beimplemented, if desired, and related systems and methods can beutilized, as well.

DESCRIPTION OF THE DRAWINGS

It is noted that the appended drawings illustrate only exampleembodiments and are, therefore, not to be considered as limiting of thescope of the inventions, for the inventions may admit to other equallyeffective embodiments.

FIG. 1 is a block diagram of an example embodiment for a network-baseddistributed interactive music system.

FIG. 2A is a block diagram of an example embodiment for a music node(MN).

FIG. 2B is a block diagram of an example embodiment foraudio/video/network/data subsystems within a music node.

FIG. 2C is a block diagram of an example hardware embodiment for a musicnode.

FIG. 2D is a block diagram of an example embodiment for network packetsthat can be transmitted within the interactive music system.

FIG. 3A is a block diagram of an integrated music node embodiment thatincludes components within one or more electronic devices with one ormore connections to the network.

FIG. 3B is a block diagram of an integrated music node embodiment thatincludes components within one physical electronic device connected tothe network.

FIG. 3C is a block diagram of an example embodiment of a music nodeembodiment where audio components are separated into a dedicated audioprocessing appliance device.

FIG. 3D is a block diagram of an example embodiment for a sessioninformation and control window to provide interactive control for themusic session by the user.

FIG. 4A is a block diagram of a example embodiment for a dedicated audioprocessing appliance device.

FIG. 4B is a circuit and component diagram of an example embodiment forconnections to an audio input/output processor for a dedicated audioprocessing appliance device.

FIG. 4C is a hardware layout diagram of an example embodiment for adedicated processing appliance device.

FIG. 4D is a block diagram of an example embodiment for a audio softwarestack including a user space and a kernel coupled to an audio interface.

FIG. 5A is a block diagram of an example embodiment for an interactivemusic server system.

FIG. 5B is a block diagram of an example hardware embodiment for serversystem.

FIG. 6A is a swim lane diagram of an embodiment for latency scoring fortwo music node (MN) client systems (MNA and MNB) and a server.

FIG. 6B is a swim lane diagram of an example embodiment for MN packetrate scoring.

FIG. 6C is a swim lane diagram of an example embodiment for MN bandwidthscoring.

FIG. 6D is a process flow diagram of an example embodiment for adaptivethrottling of packet frame size.

FIG. 6E is a process flow diagram of an example embodiment for adaptivethrottling of bandwidth.

FIG. 7A is a representative timing diagram of an example embodiment fora jitter queue.

FIG. 7B is a block diagram of an example embodiment for a jitter queue.

FIG. 7C is block diagram of an example embodiment for sending MNs havingsending queues including decimator/interpolator blocks andencoder/packetizer blocks to adjust send rates for receiving MNs.

FIG. 8A is a swim lane diagram of an example embodiment for sessionrecording service including one or more server system(s).

FIG. 8B is a block diagram of an example embodiment for a recordingsystem.

FIG. 8C is a block diagram of an example embodiment for a recordingsystem and related recording service where session recordings are storedby a server and by MNs.

FIG. 9A is a signal diagram showing metronome pulses associated withthree different local metronomes that are based upon a single metronomepulse.

FIG. 9B is a signal diagram showing metronome pulses associated withthree different local metronomes that have been synchronized.

FIG. 10A is a diagram of sound location perception by a person hearingsounds from two sources.

FIG. 10B is a diagram of an example locations or positions for musicsession elements within a virtual space.

FIG. 10C is a diagram of an example dummy head that is depicted to auser and can be adjusted by the user to place and orient the user withinthe virtual environment for the music session.

FIG. 10D is a diagram of an example dummy head that includes a virtualmicrophone array of two or more microphones.

FIG. 11A is a block diagram of an example embodiment for a low latencylive broadcast.

FIG. 11B is a block diagram of an example embodiment for a high fidelitylive broadcast.

FIG. 12A is a block diagram of an example embodiment for MNs within twogroups selected as bridges for inter-group communication.

FIG. 12B is a block diagram of an example embodiment for inter-groupcommunications for a larger interconnected group.

FIG. 13A is a block diagram of an example embodiment for a music hintingsystem that allows non-verbal cues to be communicated among MNs within amusic session.

FIG. 13B is a diagram of an example embodiment for a foot-controlledhinting device.

FIG. 14 is a block diagram of an example embodiment for a songs serviceenvironment that allows users to access and download songs/tracks/tunesfor use with a MN or within a music session.

FIG. 15A is a block diagram of an embodiment including two music nodes(A, B) communicating with each other through an ISP.

FIG. 15B is a block diagram of such an embodiment including two musicnodes (A, B) communicating with each other through different ISPs.

FIG. 16 is a block diagram of an embodiment including NAAS (network as aservice) server systems connecting two independent ISPs.

FIG. 17 is a block diagram of an embodiment including three music nodes(A, B, C) communicating with each and the server systems to set up anon-NAAS music session.

FIG. 18A is a block diagram of an embodiment including NAAS serversystems providing communications among four of music nodes for a musicsession.

FIG. 18B is a block diagram of an embodiment including three music nodes(A, B, C) communicating with each other through two different ISPs.

FIG. 19 is a block diagram of an embodiment including three music nodes(A, B, C) where only A is a NAAS participant.

FIG. 20A is a swim lane diagram of an example embodiment for a musicsession start by music node A where music nodes B and C then join thesession.

FIG. 20B is a swim lane diagram of an example embodiment for a musicsession stop where music nodes B and C leave the session.

FIGS. 21A-B provide a swim lane diagram of an example embodiment for amusic session start by music node A where music nodes B and C then jointhe session and where all three nodes (A, B, C) are NAAS participants.

FIG. 21C is a swim lane diagram of an example embodiment for a musicsession stop where music nodes B and C leave the session and where allthree nodes (A, B, C) are NAAS participants.

FIGS. 22A-B provide a swim lane diagram of an example embodiment for amusic session start by music node A where music nodes B and C then jointhe session and where only music node C is a NAAS participants.

FIG. 22C is a swim lane diagram of an example embodiment for a musicsession stop where music nodes B and C leave the session and where onlymusic node C is a NAAS participants.

FIG. 23A is a block diagram of an example embodiment for internodesession managers and data flow for an interactive music system includingpeer connections and sessions transport communications.

FIG. 23B is a block diagram of an example embodiment for peerconnections.

FIG. 24 is a block diagram of an example embodiment for music and chatcommunications from an MN to other MNs within a music session.

FIG. 25 is a block diagram of an example embodiment for a MN systemembodiment including local ICPs (input channel processors) and peer ICPs(input channel processors).

FIG. 26 is a block diagram of an example embodiment for a peer inputchannel processor.

FIG. 27A is a block diagram of an example embodiment for a local inputchannel processor that captures audio inputs from an instrument (e.g.,guitar, keyboard, voice, etc.), voice chat, or another audio input.

FIG. 27B is a block diagram of an example embodiment for a local inputchannel processor that captures audio inputs for a group of instruments.

FIG. 27C is a block diagram of an example embodiment for a local inputchannel processor that captures audio inputs for a group of instrumentsand aggregates or bonds these inputs using a group mixer.

FIGS. 28A-B are block diagrams of example embodiments for mixers thatcan be utilized.

FIG. 29 is a block diagram of an example embodiment for virtual devicebridge software that includes an application space having a clientmodule and a DAW (digital audio workstation) module and a kernel havingvirtual audio inputs and outputs.

FIGS. 30A-B are block diagrams of example embodiments for DAW data flow.

DETAILED DESCRIPTION

Network-based interactive music systems and related methods aredisclosed. The disclosed embodiments achieve reduced network latency andother advantageous features that provide a positive user experience formusic sessions using a network-based interactive music system. In part,the disclosed embodiments provide real-time platforms and relatedmethods for interactive and collaborative music performance andproduction. The interactive music systems allow individuals at differentphysical locations that are as simple as different rooms in one locationto locations potentially hundreds miles apart, in real-time to play,produce and share music by doing so across the internet, local areanetwork, and/or other network connections. The disclosed systems andmethods further provide a number of different components that can beused individually or in combination to provide disclosed aspects andfeatures for the interactive music systems and methods described herein.Different features and variations can be implemented, as desired, andrelated systems and methods can be utilized, as well.

FIG. 1 is a block diagram of an example embodiment for a network-basedinteractive music system 100. Music nodes (MN) 112, 114 . . . 116 areclient systems for the interactive music system 100 that have one ormore network connections to a network 110. These music nodes (MN) 112,114 . . . 116 are part of one or more interactive music session(s) 150.The music nodes (MN) 112, 114 . . . 116 in part run music nodeapplications (MN APP) 122, 132 . . . 142, respectively, that implementthe various functional features described herein. The music nodes (MN)112, 114 . . . 116 also in part use storage systems 124, 134 . . . 144to store MN related data, such as audio recordings and other data asdescribed below. The music nodes (MN) 112, 114 . . . 116 also receiveone or more audio inputs (AUDIO IN) and produce one or more audiooutputs (AUDIO OUT), as described in more detail herein. The interactivemusic server system(s) 102, 104, 106 . . . provide server-based servicesand management for the interactive music system 100 and/or theinteractive music session(s) 150, as described herein. In part, forexample, the interactive music server system(s) 102, 104, 106 . . .manage session setup and tear down for music sessions for the musicnodes (MN) 112, 114 . . . 116 participating in interactive musicsessions. The server system(s) 102, 104, 106 . . . also in part usestorage systems to store MN, session, and service related data such asaudio recordings and other data as described below.

It is noted that the music node applications 122, 132 . . . 142 can bedownloaded from the interactive music server system(s) 102, 104, 106 . .. through network 110 and installed on the music nodes (MN) 112, 114 . .. 116. The music node applications 112, 132 . . . 142 can also be loadedonto the music nodes (MN) 112, 114 . . . 116 separate from the network110, if desired. Further, The music nodes (MN) 112, 114 . . . 116 can beany of a wide variety of information handling systems including one ormore electronic devices or systems that participate in the interactivemusic system 100 and/or the interactive music session(s) 150. Eachserver system 102, 104, 106 . . . can also be any of a wide variety ofinformation handling systems including one or more electronic devices orsystems that provide the server-based services for the interactive musicsystem 100 and/or interactive music session(s) 150. The data storagesystems can also be a wide variety of devices or components that areconfigured to store data within a non-transitory data storage medium.

It is also noted that the network 110 can be any variety of wired orwireless network connections and devices through which networkcommunications occur among the music nodes (MN) 112, 114 . . . 116; theserver system(s) 102, 104, 106 . . . ; and/or other network connectedsystems, devices, or components. The network 110 can include theinternet, internal intranets, local area networks (LANs), wide areanetwork (WANs), personal area networks (PANs), wireless networks, wirednetworks, home networks, routers, switches, firewalls, network interfacecards, network interface controllers, and/or any other networkcommunication system, device, or component that provides wired and/orwireless communication connections between electronic systems. Further,these network communication elements can be internal to and/or externalfrom the music nodes (MN) 112, 114 . . . 116; the server system(s) 102,104, 106 . . . ; and/or other network connected systems, as desired.

Example embodiments for music nodes (MNs) and the server system(s) arefurther described with respect to FIGS. 2A-2D, FIGS. 3A-D, FIGS. 4A-Dand FIGS. 5A-B. Operational features and embodiments are furtherdescribed below with respect to FIGS. 6A-E, 7A-C, 8A-C, 9A-B, 10A-D,11A-B, 12A-B, 13A-B, and 14. Further, APPENDIX A below and FIGS. 15A-B,16, 17, 18A-B, 19, 20A-B, 21A-C, and 22A-C describe additionalembodiments and example details including MN registration, networkcommunications, control messages, and other aspects for the interactivemusic system and for the NAAS (Network as a Service) server systems thatprovide lower latency network communications for music sessions.APPENDIX B below and FIGS. 23A-B, 24, 25, 26, 27A-C, 28A-B, 29, and30A-B provide further example embodiments for the interactive musicsystem including further example embodiments related to music nodes(MNs) and the server system(s). APPENDIX C below provides example APIs(application program interfaces) that can be utilized.

It is noted that the networks described herein can be wired and/orwireless networks that include one or more devices (e.g., routers,switches, firewalls, gateways, interface devices, network servers, etc.)that provide for network communications between network-connectedcomputing devices, including internet communications. As such, it isunderstood that the network data transfer of frames and packets asdescribed can be implemented using any of a wide variety of techniques,including wired and/or wireless communications between one or morecomputing systems or devices. It is further noted that the data or filestorage systems described herein can be any desired non-transitorytangible medium that stores data, such as data storage devices, FLASHmemory, random access memory, read only memory, programmable memorydevices, reprogrammable storage devices, hard drives, floppy disks,DVDs, CD-ROMs, and/or any other non-transitory data storage mediums.

It is also noted that the functional blocks, modules, operations,features, and processes described herein for the disclosed embodimentscan be implemented using hardware, software, or a combination ofhardware and software, as desired. In addition, one or more processingdevices running software and/or firmware can also be used to implementthe disclosed embodiments. It is further understood that one or more ofthe operations, tasks, functions, features, or methodologies describedherein (e.g., including those performed by the MNs 112, 114 . . . 116;the server system(s) 102, 104, 106 . . . ; and the NAAS server systems1602) may be implemented, for example, as hardware, software, or acombination of hardware and software, including program instructionsthat are embodied in one or more non-transitory tangible computerreadable mediums (e.g., memory) and that are executed by one or moreprocessors, controllers, microcontrollers, microprocessors, hardwareaccelerators, and/or other processing devices to perform the operationsand functions described herein.

It is also noted that the processing devices described herein caninclude hardware, software, firmware, or a combination thereof. In oneembodiment, the components of the processing devices may form in part aprogram product with instructions that are accessible to and executableby processing circuitry to perform the functions of the processingdevices described herein. The instructions for the program product maybe stored in any suitable storage media that is readable by theprocessing devices, and the storage media may be internal and/orexternal to the processing devices.

In addition, integrated circuits, discrete circuits, or a combination ofdiscrete and integrated circuits can be used, as desired, to perform thefunctionality described herein. Further, programmable integratedcircuits can also be used, such as FPGAs (field programmable gatearrays), ASICs (application specific integrated circuits), and/or otherprogrammable integrated circuits. In addition, one or more processingdevices running software or firmware can also be used, as desired. Forexample, computer readable instructions embodied in a tangible medium(e.g., data storage devices, FLASH memory, random access memory, readonly memory, programmable memory devices, reprogrammable storagedevices, hard drives, floppy disks, DVDs, CD-ROMs, and/or any othertangible storage medium) could be utilized to store instructions thatcause computer systems, programmable circuitry (e.g., FPGAs),processors, and/or other processing devices to perform the processes,functions, and capabilities described herein.

It is further noted that the MNs 112, 114 . . . 116; the serversystem(s) 102, 104, 106 . . . ; NAAS server systems 1602 describedbelow; and/or other electronic computing devices described herein can beimplemented using one or more information handling systems that includeone or more processing devices (e.g., processor, controller,microcontroller, microprocessor, digital signal processor, and/or otherprocessing device) for executing and otherwise processing instructions,and for performing additional operations (e.g., communicatinginformation) in response thereto. Each such electronic computing deviceis formed in part by various electronic circuitry components that areconfigured to perform the device operations. Further, an informationhandling system may include any instrumentality or aggregate ofinstrumentalities operable to decode, encode, compute, determine,process, transmit, receive, store, display, communicate, detect, record,reproduce, or utilize any form of information or data for business,scientific, control, or other purposes. For example, an informationhandling system may be a personal computer (e.g., desktop or laptop),tablet computer, mobile device (e.g., personal digital assistant (PDA)or smart phone), server computer (e.g., blade server or rack server), anetwork storage device, or any other suitable electronic device and mayvary in size, shape, performance, and functionality. The informationhandling system may include random access memory (RAM), one or moreprocessing resources such as a central processing unit (CPU), hardwareor software control logic, read only memory (ROM), and/or other types ofnonvolatile memory. Additional components of the information handlingsystem may include one or more disk drives, one or more network portsfor communicating with external devices as well as various input andoutput (10) devices, such as a keyboard, a mouse, a touch screen videodisplay, a non-touch screen video display, and/or other devices orcomponents. The information handling system may also include one or morebuses operable to transmit communications between the various hardwarecomponents and/or to external devices or systems.

Music Node (MN) Client System

A music node (MN) is one or more electronic devices or systems that inpart provide audio input/output and related processing for one or moreusers of the interactive music system. The music node (MN) operates inpart as a client system with respect to the server system describedbelow. For one embodiment, the music node includes one or more of thefollowing components: audio capture input subsystem, audio play outputsubsystem, audio encoder, audio decoder, video input system, userinterface and control subsystem, file storage system, and a networkinterface. Different and/or additional components could also beincluded, if desired, and variations could be implemented while stillproviding a music node for the interactive music system embodimentsdescribed herein. It is also noted that operation at low latency isdesired for the overall user experience, and low latency is preferablyless than 15 milliseconds delay between audio packets captured and sentfrom on MN and received and processed by another MN.

FIG. 2A is a block diagram of an example embodiment for music node (MN)112. The music node (MN) application 122 includes one or more differentfunctional modules 260, 261, 262, 263, 264, 265, and/or 266 to providethe features of the music nodes as described in more detail below. Forexample, a registration module 261 is configured to communicate with theserver system(s) to provide registration features for the MN 112. Asession control module 262 is configured to provide session controloptions to allow users to control their session experience. A jitterqueue module 263 is configured to provide control of the audio framequeue used to communicate with other MNs within a created sessionthrough the network 110. A recording module 264 is configured to storerecordings of audio inputs received by the MN 112 both locally andthrough the network 110. A tunes module 266 is configured to providefeatures associated with the packaged tunes service described below.Other modules 265 can also be provided, as desired. The control module270 provides overall control for the MN 112 and coordinates theoperations of the other functional blocks. As also described herein, theMN application 122 also uses and stores MN data 124, as needed, for itsoperations. It is further noted that the other music nodes (MN) 114 . .. 116 can be configured similarly to music node (MN) 112 or could beimplemented differently, as desired. As such, a wide variety of musicnode (MN) implementations could be used together within the interactivemusic systems 100 and as part of one or more music sessions 150.

FIG. 2B is a block diagram of an example embodiment foraudio/video/network/data subsystems within a music node 112. One or moreaudio inputs (AUDIO IN) are received by an audio capture input subsystem202, and digital audio is provided to an audio encoder 206. It is notedthat the audio inputs can be analog signals or digital signals. Ifanalog signals are input, then the audio capture input subsystem 202samples these analog input signals to produce the digital audio. Ifdigital signals, then the audio capture input subsystem 202 can sendthis digital audio to the audio encoder 206 or resample the digitalaudio inputs and then provide the digital audio to the audio encoder206. The audio encoder 206 provides encoded audio data to theinteractive music controller 250. This encoded audio data can then bestored as audio data 216 within the file storage subsystem 214, whichcan also store other data 218 associated with the operations of themusic node 112. The encoded audio data can also be output through thenetwork interface 230 to the network 110. The encoded audio and/or audiodata received from the network 110 through the network interface 230 canbe provided by the interactive music controller 250 to an audio decoder208. The audio decoder 208 decodes the encoded audio data and outputsdigital audio to the audio play output subsystem 204. The audio playoutput subsystem 204 then outputs audio output signals (AUDIO OUT) fromthe music node 112. The audio play output subsystem 204 can include oneor more digital-to-analog converters to convert the digital audio fromthe audio decoder 208 to analog output signals, or the audio play outputsubsystem 204 can output the digital audio itself or re-sampled versionsof the digital audio as the audio output signals (AUDIO OUT). The musicnode 112 can also include a display and control subsystem 220 thatdisplays session information 222 and/or one or more graphical usercontrols 224. A user is thereby allowed to interact with and control theoperations of the music node 112 through the display and controlsubsystem 220. Other input/output (IO) interfaces 226 can also beprovided to allow other user IO interfaces or IO interfaces to otherelectronic systems. It is understood that that the interactive musiccontroller 250 communicates with the different blocks within FIG. 2Busing one or more control signals or commands to those blocks. Othervariations could also be implemented.

FIG. 2C is a block diagram of an example hardware embodiment for musicnode 112. A system bus 260 provides communications between the differentsubsystems and components of the music node 112. One or moreprocessor(s) 272 communicate with the audio subsystems 202/204/206/208using one or more communication paths, with video subsystems 210/212/220using one or more communication paths, network interface 230 using oneor more communication paths, and IO subsystems 226 using one or morecommunication paths. The processor(s) 272 also communicate withnon-volatile storage system 274 that stores music node (MN) data 124,such as the audio data 216 and/or other data 218 indicated above. Thenon-volatile storage system 274 also stores the music node application(MN APP) 122, which can include program instructions that are executedby one or more processor(s) 272 to implement the functions describedherein for the music node 112. The non-volatile storage system 274 canbe, for example, hard drives, optical discs, FLASH drives, and/or anyother desired non-transitory storage medium that is configured to storeinformation. Further, the one or more processor(s) 272 communicates withvolatile memory 270 during operations to facilitate their operations.The volatile memory 270 can be, for example, DRAM (dynamic random accessmemory), SDRAM (synchronous random access memory), and/or any otherdesired volatile memory that is configured to store information whilepowered.

FIG. 2D is a block diagram of an example embodiment 280 for networkpackets that can be transmitted within the interactive music system 100.A network transmission 282 of network packets is shown for N packets(PKT1, PKT2, PKT3 . . . PKT(N)). As shown with respect to the examplepacket 284, each of the transmitted packets can be configured to includeaudio frame data 294, and audio header (HDR) 292, and a protocol headersuch as IP/UDP (internet protocol/user datagram protocol) header 290.Each packet can also include optional chat data 298 and a chat header(HDR) 296. It is also noted that the audio header 292 can includesession control information, such as for example, track volume levels,master volume levels, recording start commands, recording stop commands,hinting selections, and/or other session related information. It is alsonoted that control packets can also be communicated separately fromaudio related packets among the MNs and between server system(s) and theMNs. Example values for byte sizes and data rates are described withrespect to example embodiments below in APPENDIX A. For example, as oneembodiment, the audio can be captured and encoded at 256 kilobits persecond, and 2.5 millisecond data frames can be used to generate 400packets-per-second that are the wrapped with header information andtransmitted through the network 110. It is further noted that embodiment280 provides one example packet structure that can be used for networkcommunications for the interactive music system embodiments describedherein, and other packet structures could also be utilized. For example,for communications where audio data is not communicated, a networkpacket can be used that includes header information and a payload havingcontrol information, MN related information, and/or other music sessioninformation communicated among the music nodes and server system(s).Other packet structures could also be used.

Functional blocks within FIG. 2B are now further described, although itis again noted that variations could be implemented for these functionalblocks.

Audio Capture Input Subsystem (202).

The audio capture input subsystem converts audio inputs to digitalframes of audio information, preferably with low latency. For example,the audio input subsystem can sample analog audio inputs at a selectedand/or fixed sampling rate, preferably of at least 44.1 KHz, and canoutput digital audio frames containing digital audio information,preferably 10 milliseconds (ms) or less of audio information. If theaudio input from the audio source is already digital, a digital transferfrom the audio source to the audio input subsystem can be utilized,preferably again having low latency. Digital audio frames containingdigital information can again be output by thee audio input subsystem.Resampling can also be used, as needed, by the audio input subsystem tomatch digital sample rates between a digital audio source and the audiooutput frames for the audio input subsystem.

Audio Play Output Subsystem (204).

The audio play output subsystem produces analog output signals and/or byconverting digital audio information to analog output signals. Forexample, digital audio frames from other MNs can be received andconverted to analog output signals. As indicated above, these digitalaudio frames can include a selected amount of audio information, such asabout 10 ms or less of audio information. Resampling can also be used,as needed, to match the digital sample rates between the audio playoutput subsystem and the audio output destination, such as an externalreceiver or sound system.

Audio Encoder (206).

The audio encoder encodes or compresses digital audio information toprovide compressed audio information. The audio encoder is alsopreferably low latency. The audio encoder operates to process thedigital audio frames of digital audio information captured at the audioinput subsystem and produces a compressed audio stream. The audioencoder can also use error correction to embed error correctioninformation that can be used by a decoder to detect and where possiblecorrect and recover from errors induced on the audio stream duringtransmission or storage. The output encoded audio data from the encodercan also be packetized within network packets for transmission over anetwork.

Audio Decoder (208).

The audio decoder decodes or decompresses incoming audio packets fromother MNs or sources to provide uncompressed digital audio outputs. Theaudio decoder also uses error correction information with the packets todetect errors and apply error recovery to improve the quality of thedecoded audio. As such, high quality audio with low SNR (signal-to-noiseratio) is achieved. Preferably, the audio decoder operates with lowlatency, and the audio decoder is configured to output audio framescontaining 10 ms or less worth of digital audio.

Display and Control Subsystem (220).

The input and a display subsystem allows a user to interact with the MNfor management, configuration, diagnostics and general use and/orcontrol. Video of other users in the music session may also be shown onthis display.

Video Input Subsystem (210).

If video input is desired, a video input subsystem is used to capturevideo and preferably operates with low latency. The video inputsubsystem can be used to allow live video of users playing in a musicsession to be shared. It is noted that the latency of the video capturesubsystem can be allowed to be higher than the latency of the audioinput subsystem while not significantly degrading the user's sessionexperience. However, it is still preferable that MN provide at least 30frames-per-second of video to ensure a real-time user experience.

File Storage System (214).

A file storage system can also be included to store digital audioinformation. The MN uses a recording process, which is described furtherbelow, to store multiple audio streams concurrently.

Network Interface (230).

An input/output network interface is provided that preferably operateswith low latency. The audio processing application input network path ofthe MN includes a jitter queue buffer management system, which isdescribed in more detail below. The MN also uses the network forinteraction with a server that manages the music session, as alsodescribed in more detail below. The MN also uses the network forcommunication with peers in the music session. In general, the followingclasses of data flows occur in the MN: (1) peer-to-peer music data, (2)peer-to-peer state and session control data, (3) peer-to-peer videodata, and (4) server session management and control data. It is alsonoted that peer-to-peer data may also be sent via a proxy server thatmay process the data before relaying it to another MN (e.g., aggregatepackets, process and mix audio into a single audio stream, and/orperform other desired data processing).

It is also noted that although the components in FIG. 2B above aredescribed with respect to an embodiment for a music node (MN) 112,different and/or additional components could be utilized in otherembodiments. As such, the components can be varied, as desired. Further,the operation of each component could also be varied, if desired.

FIGS. 3A-D provide further different implementation embodiments for themusic node (MN) 112. FIG. 3A is a block diagram where components areimplemented in one or more electronic devices or systems havingindependent connections to the network 110. FIG. 3B is a block diagramwhere components are implemented within a single electronic device orsystem having at least one connection to the network 110. FIG. 3C is ablock diagram where components are implemented using an audio streamingappliance having a separate connection to the network 110. FIG. 3Dprovides an example embodiment of a graphical user interface providingsession management and control for MNs.

Looking now to FIG. 3A, a block diagram of an integrated music nodeembodiment 320 that includes the components described above within oneor more electronic devices with one or more connections to the network110. Components 302 provide the display and control interface for themusic session along with low latency video decode. A session informationand control window 310 is displayed to a user that provides sessioninformation and control. Components 304 provide the audio input/outputincluding audio input capture, encode, and streaming to the network 110,as well as audio stream receiver, decoder and local output player.Components 306 provide the video capture, encode, and streaming forlocal video through a video capture device, such as a video camera. Theembodiment 320 can also include direct control paths between thecomponents that are integrated portions of the system.

FIG. 3B is a block diagram of an integrated music node embodiment 330that includes the components 302/304/306 described above within onephysical electronic device 332 connected to the network 110. It is notedthat for the embodiment 330 no external network is needed to communicatebetween the internal components. It is further noted that the audioin/out connections to the embodiment 330 can be through built-in orexternal connections, such as internal or external USB (universal serialbus) ports connected to one or more audio input sources or outputdevices. Further, the video capture can use built-in or external videoconnections, such as internal or external USB ports. A system softwarestack 334 provides control of the internal operations for the device332, and the system software stack 334 can be implemented using one ormore processor(s) running instructions stored in a non-transitorystorage medium, as described herein.

FIG. 3C is a block diagram of an example embodiment 340 of a music node(MN) where audio components 302/304/306 are separated into a dedicatedaudio processing appliance device 346. As depicted, the dedicated audioprocessing appliance 346 includes components 306 providing the audiocapture, audio input processing, audio encode/decode, and peer-to-peer(P2P) network audio interface. The separate device 342 includescomponents 302 and 304 providing the video, display, and user inputmechanism (e.g., keyboard, mouse, touch-screen, etc.) and any additionalremaining parts of the separate device 342. A system software stack 344also provides control of the internal operations for the device 342, andthe system software stack 344 can be implemented using one or moreprocessor(s) running instructions stored in a non-transitory storagemedium, as described herein. The separate device 342 can be, forexample, desktop computer, laptop, tablet, smart phone, and/or anothercomputing device.

FIG. 3D is a block diagram of an example embodiment for a sessioninformation and control window 310 that is displayed to a user (e.g.,through an application graphical user interface (GUI)) to provide inpart the interactive control for the music session by the user. Asdepicted, the window 310 includes a section 352 that shows audio inputsfor tracks being recorded by the local music node, such as a guitarinput and microphone (voice) input. Related controls are also providedwithin section 352, such as for example volume controls for each ofthese tracks, and these controls allow a user to make adjustments tohis/her own tracks in the session. A master volume control can also beprovided. The window 310 also includes a section 354 that shows livetracks associated with other MNs within the session, such as amicrophone (voice) and keyboard inputs for one or more additional MNs inthe session. Related controls are also depicted within section 354, suchas for example volume controls for each of these tracks, and thesecontrols allow a user to make adjustments to other non-local tracks inthe music session. Selection buttons can also be provided to initiate arecording of tracks within the music session. The window 310 alsoincludes a section 356 that shows recordings that have been made fortracks within the music session, such as for example guitar recordings,microphone (voice) recordings, and/or keyboard track recordings. Relatedcontrols are also depicted within section 356, such as for examplevolume controls for each of these recorded tracks, and these controlsallow a user to make adjustments to all of the recorded tracks for themusic session. Controls can also be provided for play back control ofthe recordings, such as for example a play button and a position sliderfor the recordings. It is further noted that additional or differentsession information and/or controls can also be provided as part of thewindow 310. Further, it is noted that additional windows could also beused, and information and controls can be organized, as desired, amongthese windows while still providing session information and control to auser through a graphical user interface displayed by the music node(MN).

FIGS. 4A-D are block diagrams of a further example embodiment for theaudio streaming appliance 346. FIG. 4A is a block diagram of an exampleembodiment for a dedicated audio processing appliance device 346. FIG.4B is a circuit and component diagram of an example embodiment forconnections to an audio input/output processor for a dedicated audioprocessing appliance device. FIG. 4C is a hardware layout diagram of anexample embodiment for a dedicated processing appliance device. FIG. 4Dis an example embodiment for an audio software stack that can be usedwith the dedicated audio processing appliance device or with other MNembodiments if a separate audio processing appliance device is not beingused to implement the MN.

FIG. 4A is a block diagram of an example embodiment 400 for a dedicatedaudio processing appliance device 346. For the embodiment depicted, adevice body 402 includes one or more external connections andinput/output components, such as for example USB (universal serial bus)connections, SD (secure digital) card reader, a power connector, an RJ45Ethernet connector, a status LED, a synchronization (sync) button, XLRconnectors, a mono connector, a HP (headphone) connector, and/or otherdesired connections or components. The device body also includes one ormore printed circuit boards on which are mounted one or more integratedcircuits, discrete components, and electronic communication traces. Forexample, an audio codec integrated circuit (e.g., PCM3061A from TexasInstruments) can be used that outputs audio such as through theheadphone (HP) connector and captures audio inputs (e.g., samplingfrequency of 8-96 kHz) such as from the XLR connectors and the monoconnector as well as an internal microphone if included. Also, aprocessor integrated circuit (e.g., iMX6 from Freescale Semiconductor)can be coupled to the audio codec and other components to process theaudio input/outputs as well as other MN and music session relatedinput/outputs. Other components could also be included such as EEPROMs(electrically erasable programmable read only memories), DRAMs (dynamicrandom access memories), clock circuits, crystal circuits, powermanagement integrated circuits, DC-to-DC converters, Ethernet physical(PHY) layer integrated circuits, and/or other desired components.

FIG. 4B is a circuit and component diagram of an example embodiment 420for connections to an audio codec 430 for a dedicated audio processingappliance device. Example audio connections 422, 424, 426, and 428 areshown as well as example circuits that can be coupled to one or moreprinted circuit boards between these audio connections and the audiocodec 430. As described above, these components can all be locatedwithin a device body for an audio processing appliance device. Audioconnection 422 is a headphone connector this is coupled to receive left(L) and right (R) audio outputs for the audio codec 430. Audioconnection 428 is a chat microphone connector that is coupled to provideaudio input voltages to the audio codec 420. Audio connection 424 is acombined XLR microphone connector and audio line-in connector that iscoupled to provide audio input voltages to the audio codec 430. A switchis also provided to switch between the XLR microphone input and theline-in input. Audio connection 426 is similar to audio connection 424.The audio codec 430 captures audio inputs and provides audio outputs andcommunicates audio data and control information to and from otherelectronic devices using a digital interface, such as a digital serialinterface (e.g., 12S interface). Variations could be implemented asdesired.

FIG. 4C is a hardware layout diagram of an example embodiment 450 for adedicated processing appliance device. The front 402A of the device bodyincludes connectors such as the headphone (HP) jack and the XLR comboconnectors. The back 402B of the device body includes connectors such asan Ethernet connector, USB connectors, sync button, and a powerconnector. The printed circuit board 452 includes one or more integratedcircuits and/or other discrete circuits or electrical components, aswell as interconnecting electrical traces. While an example layout ofcomponents is shown, it is understood that this layout is just oneexample, and other implementations and layouts could be used.

FIG. 4D is a block diagram of an example embodiment for an audiosoftware stack 460 including a user space 462 and a kernel 464 coupledto an audio interface for the audio codec 430. The software stack 460can be implemented, for example, as one or more processing devicesexecuting program instructions stored in a non-transitory storagemedium. As indicated above, one processing device that can be used is aniMX6 processor from Freescale Semiconductor. The software stack provideslow-latency audio input/output. In part, the embodiment depictedcaptures audio at the codec input and sends chunks (e.g., 2.5 ms chunks)of captured audio to the audio application where it is processed. Thisprocessed audio is sent back to the codec to have it played as an audiooutput and is also sent through network communications to peers within amusic session. The internal audio input/output latency is preferablyless than 3 ms and has a variance of 0.001 or less. An Enhanced SerialAudio Interface (ESAI) subsystem and driver can also be used to transmitand receive digital audio from the audio codec. Further, parallel and/orserial digital interfaces (e.g., I2S, I2C) can be used between the audiocodec and the processing device implementing the software stack 460. Anopen source audio platform, such as PortAudio, can also be implementedwithin the software stack 460 to provide audio processing within theuser space 462. Further, continuous memory allocators (CMEMs) can alsobe used as well as SDMA (smart direct memory access) controllers. Othervariations can also be implemented.

Interactive Music Server System Server Services

Where the MN embodiments described above provide the input/output ofmusic for the user and other user input/control, the server provides oneor more of the following server services: user registration, musicsession creation, pre join session scoring, recording management, livebroadcasting management, global session interface, and/or other serverservices. Different and/or additional server services can also be usedor provide, and variations can also be implemented.

FIG. 5A is a block diagram of an example embodiment for an interactivemusic server system 102. As described herein, the server system 102 canprovide one or more server services for the interactive music system 100and the music sessions 150 for the music nodes 112, 114 . . . 116 asshown in FIG. 1. Looking to the example embodiment of FIG. 5A, theserver system 102 includes a user registration module 502 that operatesto provider user registration services, pre-join session scoring module504 that manages MN scoring for maintaining session quality, a sessionmanagement module 506 that facilitates the creation and joining/leavingfor music sessions, live broadcast management module 508 that manageslive broadcasts for the music sessions, a recording management module510 that manages the movement of recordings among the session MNs, aglobal session control interface and management module 512 that managesthe in-session controls selected by the various MN users, a tunes module515 that provides features associated with the packaged tunes servicedescribed below, and/or other modules 514. For the example embodimentdepicted, the server system 102 also includes a database system 520 thatis used by the control module 516 and the other modules to store dataassociated with the operation of the interactive music system 100,including the server systems and the music nodes. For example, thedatabase system 520 stores session information 522, recordings 524 forthe sessions, registration information 526, scoring information 528,and/or other information 530. The operation of example modules for theserver services is described in more detail below.

It is noted that one or more server systems (e.g., server systems 104,106 . . . in FIG. 1) can also be used to implement the functionalmodules for server system 102 in FIG. 5 and described herein. Thesefunctional modules can also be distributed among the server systemsbeing used, as desired. Further, multiple server systems can performsimilar functions, and load balancing can be used to distributeworkloads for the interactive music system 100 among the differentserver systems. Similarly, the database system 520 can be implementedusing one or more data storage devices, and these data storage devicescan be internal to or external from the server system(s), as desired.For example, the data storage system 520 can be implemented usinginternal hard drives, external hard drives, a RAID (redundant array ofindependent drives) system, network attached storage, and/or any otherdesired data storage device(s) that provide non-transitory data storagemediums. Other variations could also be implemented while stillutilizing one or more server systems and related database systems toprovide the server services described herein.

FIG. 5B is a block diagram of an example hardware embodiment for serversystem 102. A system bus 560 provides communications between thedifferent subsystems and components of the server system 102. One ormore processor(s) 568 communicate with network interface 564 using oneor more communication paths, IO subsystems 562 using one or morecommunication paths, with non-volatile storage system(s) 570, and withvolatile memory 566 using one or more communication paths. In additionto storing server services data, as described above, the non-volatilestorage system(s) 570 can also store program instructions that areexecuted by one or more processor(s) 568 to implement the functionsdescribed herein for the server system 102. The non-volatile storagesystem 570 can be, for example, hard drives, optical discs, FLASHdrives, and/or any other desired non-volatile storage medium that isconfigured to store information. Further, the volatile memory 566 canbe, for example, DRAM (dynamic random access memory), SDRAM (synchronousrandom access memory), and/or any other desired volatile memory that isconfigured to store information while powered.

Functional blocks within FIG. 5A are now further described, although itis again noted that variations could be implemented for these functionalblocks. It is further noted that APPENDIX A below describes additionalembodiments and example details including MN registration, networkcommunications, control messages, and other aspects for the interactivemusic system and for NAAS (Network as a Service) server systems thatprovide network communications for music sessions.

User Registration (502).

Each user registers with the server and creates an account. As part ofthis registration, users also provide certain meta-data such as the kindof instrument(s) they play, the location that they live, and/or otheruser data information. After registering, a user can access the serversystem, such as through a web browser and internet connection, and theuser can sign in to the server services.

Music Session Creation and Management (506).

Once a user is signed in from a MN, the user is able to create musicsessions. A music session is a server resource that a user may sharewith other users, inviting them to join and play music together orlisten to music occurring in the session. A session can be a privatesession such that only the creator or members of the session may inviteothers to join or listen. A session can also be a public session suchthat it is listed on the server so that any user with a MN can discoverand request to join or listen. The user creating the session can selectwhether or not to create the session as a public or private session, andthis selection can also be changed once the session is created.

Pre-Join Session Scoring (504).

To help ensure that users have a positive experience when in a musicsession, the server can direct the MNs associated with requests to joinsessions to perform one or more qualifying tests to provide scoring forthe MNs requesting to join. The scoring results of these qualifyingtests are sent by the MNs to the server. These qualifying tests caninclude, for example, reporting network latency information associatedwith the network latency between the MNs that would be involved in thesession. The server then uses the result data passed back to allow theuser to join the session, disallow the user from joining the session,provide a warning to the current session participants concerning the newuser requesting to join the session, and/or other actions based upon theresults of the scoring process. For example, if the latency between thejoining MN and one or more of the MNs that are already in the session isbeyond a predefined threshold, the server may disallow the user fromjoining the session or warn the current session MNs but allow the MN tojoin. The current session MNs can also be given control of allowing ordisallowing the new MN to join based upon the scoring results.

Recording (510).

The server can also store and subsequently manage access to recordingsmade by users in a session. This recording management can also includemechanisms for merchandising the content, sharing or editing of thesession recordings.

Live Broadcasting (508).

The creator of a music session may also elect to live broadcast thesession. The server manages access to the live broadcast streamaccording to the terms requested and/or selected by the user controllingthe session. For example, the user can choose to have access to the livebroadcast be paid access or free access, to set a limit for the numberof listeners, or to allow only invited users to listen, and/or toprovide other terms associated with the live broadcast. The server alsodirects the MN to start/stop the broadcast, for example, to start thebroadcast when there is at least one listener and to stop the broadcastwhen there is none.

Global Session Interface (512).

One particularly advantageous aspect to this interactive music systemembodiments described herein is that the server provides MN users in asession with a common audio mixer view of all the live input andplayed-back music sources (tracks) at the MNs in the session, such asfor example the embodiment for window 310 shown in FIG. 3D. The trackcontrols (volume, mute, etc.) for any track within the session affectthe track at the MN from which it originates. As such, a user at one MNcan adjust tracks for the entire session, even though tracks mayoriginate at one or more other MNs within the session, and theseadjustments are sent as network communications to the other MNs. Theother MNs receive these control messages and adjust their settingsaccordingly. This global session interface enables any user in thesession to configure the track mix setting for the session. By providinga session global track control, the interactive music system simplifiesthe user experience. For example, even if only one user in the sessionhas basic knowledge of audio mixing, a high quality final mix of theoverall session can still be produced that is good enough for immediatebroadcast, recording, and/or for the session musicians to appreciate theresult of the in-session effort.

Example operational features and embodiments for the interactive musicsystem will now be further described with respect to FIGS. 6A-C (sessionscoring), FIGS. 6D-E (adaptive throttling), FIGS. 7A-C (jitter queue),8A-C (recording), 9A-B (distributed metronome), 10A-D (virtualpositioning), 1 lA-B (concert broadcast), 12A-B (large group session),13A-B (musician hinting), and 14 (songs/tracks/tunes service).

Session Scoring

Before a MN is allowed into a session, it is first qualified using asession scoring. This pre join session scoring helps to ensure that allusers in the session have a good experience. The following discussionprovides more detailed examples for the scoring process.

Latency Scoring and Thresholds.

Depending upon the beats-per-minute (BPM) used in a musical performance,the performing musicians can accommodate various amounts of audiolatency and still have a qualitatively good interactive musicexperience. Latency here refers to the time it takes for sound to reachthe participating musician after leaving the sound source. In freespace, sound travels at approximately 0.34 meters per millisecond(m/ms). It is observed that generally the distance on stage thatmusicians can participate at high BPM (e.g., about 160 BPM) without adirector/conductor is about 8 meters. This distance represents a latencyof about 24 ms (e.g., 8 m/0.34 m/ms≅23.5 ms. If the BPM of theperformance is lower (e.g., about 100 BPM), it has been shown thatlatency of up to about 50 ms (e.g., representing about 17 metersseparation) can be accommodated by musicians performing together onstage.

Latency between MNs within the interactive music system embodimentsdescribed herein includes: (1) transmit latency (T) including time tocapture, encode, and transmit audio packets, (2) receive latency (R)including time to buffer (e.g., the jitter queue described below),decode, and play received audio packets, and (3) network latency (N)including time for audio packets to travel within a network between twoMNs. If the capture, encode, and transmit latency for the sending MN isrepresented by T; the receiver jitter queue, decode and play latency forthe receiving MN is represented by R; and the one-way network latencyfrom the sending MN to the receiving MN is represented by N; the totalaudio path latency or delay (D) for audio originating at the sender andarriving at the receiver can be represented as D=N+T+R.

As between one music node (MN_(i)) sending to another music node(MN_(j)), the delay (D_(i,j)) between these two nodes can is representedusing the following equations:

D _(i,j) =N _(i,j) +T _(i) +R _(j)

where is the network delay from MN_(i) to MN_(j), T_(i) is the transmitdelay for MN_(i), and R_(j) is the receive delay for MN_(j). The maximumlatency in the session (S_(delay)) can be represented by the followingequation:

S _(delay)=∀_(i,j)max(D _(i,j) ,Dj,i)

wherein all music nodes (MN) in the session as well as audio paths toand from each pair of MNs are considered to find the maximum sessionlatency.

At a MN within the session, rather than treating the transmit latencydifferent from the receive latency, the latency can also be approximatedby considering an average of the two. Thus, the latency (M_(x)) for agiven music node (MN_(x)) within the session can be represented asM_(x)=(T_(x)+R_(x))/2. Similarly, it can be approximated that differentMNs (MN_(x), MN_(y) . . . ) have similar characteristics (e.g.,M_(x)≅M_(y)) so that the latency (M) can be approximated for the MNswithin a session such that M_(x)≅M_(y)≅M.

If D_(max) is a maximum allowed music delay threshold for a session,then the latency between any two music nodes (MN_(x), MN_(y)) should beless than D_(max) to maintain a good user experience within the session.As such, it is desirable that the following equation be satisfied:(N_(x,y)+2M)≦D_(maz). This expression can be rewritten as2N_(x,y)≦(2D_(maz)−4M). The network ping between the two music nodes canbe represented as PING_(x,y)=2N_(x,y) assuming the network delay time isabout the same in both directions (e.g., N_(x,y)=N_(y,x)). Substitutinginto the previous expression, the following equation can be used toassess whether or not to allow a new MN into a session:

PING_(x,y)≦2(D _(max)−2M) or

PING_(x,y)≦2(D _(max)−NodeLatency) or

½(PING_(x,y))+NodeLatency≦D _(max)

where it is assumed that 2M=(T+R)=NodeLatency. Thus, a determination ofwhether a MN should be allowed to join a session can be based upon apredetermined node latency (e.g., transmit latency (T)+receive latency(R)) and a predetermined maximum delay (D_(max)) along with a networkping test result between the two nodes (PING_(x,y)). The condition,therefore, can be used to filter the music nodes that are allowed intosession.

FIG. 6A is a swim lane diagram of an embodiment 600 for latency scoringfor two music node (MN) client systems (MNA and MNB) and a server.First, both MNA and MNB sign on to the server. Next, the servercommunicates with MNB to prepare MNB to do a latency test with MNA. Theserver also communicates with MNA to prepare MNA to do a latency testwith MNB. The server then initiates a ping count loop for both MNA andMNB. MNA then sends the results of its latency test for MNB to theserver, and MNB similarly sends the results of its latency test for MNAto the server. As described herein, the server can use these scoringresults to determine whether or not MNA and MNB will be able to interactin a music session with latency below a threshold selected as a latencythreshold that provides for positive user experience. If the latencytest results indicate latency scoring that does not meet the selectedthresholds, then appropriate actions can be taken as described herein,such as not allowing MNB to enter a session created by MNA, issuing awarning to MNA that allowing MNB may degrade performance beyondacceptable levels, and/or any other desired action. Variations can beimplemented as desired, and example variations are described below.

Latency Scoring Optimization.

To improve the speed at which latency between a given set of MNs iscalculated, one or more of the following optimizations can also beutilized: caching, distance filter, network correlation, updating,and/or other optimization determinations. In part, these techniquesinclude estimating expect latency without requiring the MNs to initiateand respond to ping tests, as this ping testing can itself significantlyslow down the MN as numbers of MNs within the system increases.

Caching.

If latency scoring between a given pair of MNs (A, B) were recentlycalculated, use that number result instead of asking the nodes toperform new latency probes.

Distance Filter.

A distance filter can be applied using a geographic IP (InternetProtocol) address database. For consumer class internet networkservices, the observed network latency generally approximates to one waydelay of 30 miles per millisecond or 15 miles per network pingmillisecond, as the network ping includes transmit and return paths. Byusing the IP address of the MNs and a GEO IP database, the longitude andlatitude of the MNs can be determined. The terrestrial distance betweenMNs can then be computed, and internet latency can be approximated. Forexample, if a network ping time of 30 ms is used as threshold networklatency, then this translates to about 450 miles of allowed geographicseparation (e.g., 15 miles per ping ms*30 ms=450 miles). The currentapproximate geographic limit, therefore, is under about 500 milesassuming 30 ms of network latency is allowable for a good userexperience by the MNs. Thus, it is expected that users that havedistances of more than 500 miles between them are unlikely to have agood interactive music experience, as the latency will be too great toallow for a good interactive music experience.

Network Correlation.

If the IP address of a first MN (A) corresponds to the that of a secondMN (B) and the two MNs are served by the same ISP (internet serviceprovider) and are in the same local geographic area (e.g. same cityand/or zip code), then if the latency of the first MN (A) to a third MN(C) is known, the system infers that latency from the second MN (B) tothe third MN (C) will be similar and uses that scoring data.

Updating Latency Cache with Actual Latency.

The above guesses or proxies for latency are updated when the nodesactually join a session. Once joined, the actual latency between the MNsis observed and passed to the server. The server then uses this data torefine the accuracy of its latency estimation optimization. If a user isinvited explicitly to a session, then the latency of the user is notused to filter them. However, the server system can warn the new user orthe current session members of high network latency if the distance orlatency between the new user and any MN in the session is large. Theserver system also warns users periodically during session that thenetwork condition is unfavorable if the latency between one MN and itspeers goes and stays beyond a threshold.

As indicated above, as a MN comes online or requests to join sessions,the server directs them to perform latency probes with other MNs. The MNmay be dormant (e.g., not in a music session) or active (e.g., in amusic session). If the MN is in a session, the server is careful tocontrol the rate at which it asks the MN to do probes as the latencyprobe process may negatively affect the user network capacity therebydegrading the interactive audio experience. New latency probe data thatis acquired by the server is then used to refresh the server latencycache.

Latency Probe with Proxy Server.

In some cases, a MN will communicate to the network through a proxyserver. In this case the overall network latency is the network latencyfor a MN wanting to join the session to the server plus the maximumlatency from the proxy server to MNs that the joining MN wants tocommunicate with as part of a music session.

Client Decoding Capability in Scoring.

In addition to network latency, the decoding capability of the MN thatis joining the session plays a role in impacting the session experienceof all users. The compute capability of MN directly correlates to howmany audio streams it can concurrently decode and then process theresulting audio such that the real-time requirements of the system ismaintained. A MN is said to be “K” stream capable if K is the maximumnumber of audio streams it can concurrently decode and process inreal-time. If a user with a MN having decode capability of K streamstries to join a session with more than K streams in it, the user willnot be allowed and/or a warning will be issued. Similarly, it is notedthat the MN with lowest K stream capability within a session in effectlimits the session to no more than K participant streams withoutdegrading the session.

Edge Network Scoring.

Currently, for lowest audio latency, a MN will preferably need to sendaudio packets to its peers every 2.5 ms or 400 times per second. In asession that has X participants and that is fully peer-to-peer (P2P),every MN will transmit (X−1)*400 packets per second. Similarly, it willreceive (X−1)*400 packets per second. This implies that the usersnetwork (e.g., home network router or other network interface) must beable to support a full duplex packet rate of 800*(X−1) packets persecond. In a session with five (5) MNs, therefore, this produces 3200packet per second. Current technology in some home routers and wirelessnetwork access points (e.g., Wi-Fi) are unable to support this kind ofthroughput.

Similarly, as the number of MNs in a P2P session grows, the uplinkbandwidth grows linearly with number participant. For many users onbroadband networks provided by internet service providers (e.g., cablecompanies, phone companies, etc.), the downlink bandwidth issignificantly higher than the uplink bandwidth. For a MN to send a 256kilobits per second (kb/s) audio stream at 400 packets per second withUDP (User Datagram Protocol) formatting requires 380 kb/s of bandwidth.If a user has an uplink bandwidth of 1 megabits per second (1 mb/s),this uplink bandwidth clearly limits the number of P2P connections toother MNs the user MN can have to at most two MNs at this audio bitrate. By using a lower audio bit rate of about 96 kb/s, the per streamuplink bandwidth falls to 220 kb/s. With this lower bit rate, therefore,the same user can potentially accommodate four P2P MNs in a session.

The packet rate limit or bound for a user is often is reached before thebandwidth limit or bound for the user. Either way, however, bypre-scoring the user's network latency, the interactive music system isable to filter whether a MN may join a session without adverselyaffecting the user experience within the session. For example, thecreator of the session may set a criterion that only MNs that cansupport stream audio at a bit rate of X or greater and packet rate of400 packets per second to all peers within the session may join thesession. The server uses these filters in conjunction with the MN packetand bandwidth scores to determine session admission.

MN Packet Rate Scoring.

As one example, the MN packet rate scoring is performed as follows. TheMN connects to a scoring server hosted by one or more server system(s)through the network 110. The scoring server sends UDP test packets athigh rate of K packets per second for some duration T, where K ismultiple of 400 or some other selected number. The payload of the testpackets represents that of a session music payload, for example, asession music payload at 128 kb/s aggregated with that of chat stream of40 kb/s. At the end of the interval T, the MN reports to the server howmany packets it received. If the MN reports receiving 95% or more of thepackets (or some other selected threshold), it then requests anotherscoring session with the server but with twice as many packets persecond as was sent previously. This continues until the MN reports tothe server receiving less than 95% of the packets sent by the server (orsome other selected threshold).

The downlink channel packet rate (D_(RATE)) is then determined bymultiplying the final server packet rate with the percentage of packetsreceived by the MN in the last cycle. Next the uplink capacity of theclient is determined. The server directs the MN to send packets to it arate of K for T seconds. At the end of the T, the server reports to MNhow many packets it received. If the server reports receiving 95% ormore of the packets sent by the MN (or some other selected threshold),the MN will double its send packet rate to the server on the next cycle.When the uplink receive rate by the server is less than 95% (or someother selected threshold), the uplink channel rate (U_(RATE)) iscomputed by multiplying the final packet send rate of the MN with thepercentage of packets received at the server in the last cycle.

Next, the concurrent channel packet rate is computed. The server and theMN each sends packets concurrently for T seconds. The server sends atD_(RATE) and the MN sends at U_(RATE). If the server receives Upercentage of the packets the then MN and the MN receives S percentageof the packets from the server, the effective channel packet ratecapacity (C) of the MN network connection in a music session can begiven as two times the minimum of S times D_(RATE) or U times U_(RATE),which can be represented by the equation: C=2*min(S*D_(RATE),U*U_(RATE)). The channel packet rate capacity (C), for example, can beused as the MN packet rate score.

FIG. 6B is a swim lane diagram of an example embodiment 610 for MNpacket rate scoring. The MN signs on to the server. First, the downlinkpacket rate communications then occur between the MN and the server. Thedownlink packet rate result is then sent from the MN to the server.Next, the uplink packet rate communications occur between the MN and theserver. The uplink packet rate result is then sent from the server tothe MN. Finally, the concurrent packet rate communications occur betweenthe MN and the server. The concurrent downlink packet rate result isthen sent from the MN to the server, and the concurrent uplink packetrate result is then sent from the server to the MN. The final packetrate scoring result is then determined by the server and/or the MN.

MN Bandwidth Scoring.

Similarly, to determine the MN channel bandwidth score, the sequencedescribed above is repeated, but this time large payload test packetsare used to determine an effective downlink throughput (B_(DOWN)) anduplink throughput (B_(UP)), for example, in terms of megabits per second(mb/s). These rates are determined by the largest bandwidth needed at aMN to support the largest expected number of concurrent users in asession with all features of the service in play (e.g., video, music,messaging, etc. enabled). At end of the bandwidth scoring, the MNdownlink bandwidth (D_(BW)) is computed, and the uplink bandwidth(U_(BW)) is computed.

FIG. 6C is a swim lane diagram of an example embodiment 620 for MNbandwidth scoring. The MN signs on to the server. First, the downlinkbandwidth communications then occur between the MN and the server. Thedownlink bandwidth result is then sent from the MN to the server. Next,the uplink bandwidth communications occur between the MN and the server.The uplink bandwidth result is then sent from the server to the MN.Finally, the concurrent bandwidth communications occur between the MNand the server. The concurrent downlink bandwidth result is then sentfrom the MN to the server, and the concurrent uplink bandwidth result isthen sent from the server to the MN. The final bandwidth scoring resultis then determined by the server and/or the MN.

Adaptive Packet Rate Throttling.

If a MN's network environment score (e.g., packet rate scoring,bandwidth scoring) indicates that it can support only Ppackets-per-second and the number of MNs is K in the session, the MN cansend audio packets at a first packet rate as long as the MN can supporta packet rate (P) above a selected threshold, such as for example 400times per second, such that the following threshold condition remainstrue: P≧2*400(K−1). When the threshold condition becomes false, the MNswitches to a lower packet rate, such as for example to 200 packets persecond by aggregating two audio frames (e.g., two 2.5 ms audio frames)within in a single packet. The MN can also inform it peers to sendpackets to it at a lower rate, although it may throttle the send andreceive rates independently. In the case where both send and receiverates are throttled back to 200 packets per second, such as whenP≧2*200(K−1), the system may further throttle the packet rate byaggregating in single packet, such as four audio frames (e.g., four 2.5ms audio frames) in a single packet. Further aggregations and packetrate reductions could also be used.

While process of aggregating packets adds latency, the packet rate andoverall bandwidth are reduced. At 200 packets per second, for example,the MN has 2.5 ms more latency relative to 400 packets per second. At100 packets per second, the MN has 7.5 ms more latency relative to 400packets per second. If the end-to-end latency is still within thedesired limits, packet rate throttling is an effective mechanism forextending the possible set of MNs that may participate in a session. IfT_(max) is the maximum allowed latency in the session and T is thelatency of the session before packet rate down throttle, then downthrottle is allowed if (T_(max)−T) is greater than the additionallatency cause by packet rate down throttle.

It is further noted that as the number of MNs grow, the MN canadaptively down throttle the send or receive packet rates. Conversely,as the number of MNs in the session decline, the MN can adaptively upthrottle the packet send or receive rates as well. It is further notedthat if the server system is used as proxy, as described below withrespect to the NAAS (Network as a Service) embodiments, the uplink anddownlink packet rate from a MN can become invariant to the number of MNsin the session.

FIG. 6D is a process flow diagram of an example embodiment 630 foradaptive throttling of frame size when an MN leaves or joins a musicsession. When an MN leaves or joins, a new packet rate is determined forthe remaining MNs. If the rate meets latency requirements, then adetermination is made whether the framesize can be reduced. If theframesize is changed, then the rate is again checked. If the rate is notsatisfactory, then a determination is made whether to increase theframesize. If the framesize is changed, then a new packet rate is againdetermined. If not, then the new MN is rejected for the session. Once anew framesize is selected and approved, the new framesize iscommunicated to all MNs in the music session, and the new MN is acceptedinto the session.

FIG. 6E is a process flow diagram of an example embodiment 640 foradaptive throttling of bandwidth (BW). If a difference in receive BW andsend BW is detected, then a determination is made whether thecommunications are stable. If not stable, then bandwidth isdown-throttled. If stable, then a check is made to determine if BW canbe up-throttled. If a change is made, the communications are sent toadjust the MN bandwidth.

Jitter Queue

As audio packets traverse the network, jitter (variability in theinter-arrival time at the receiver) is introduced. As the audio play outpreferably happens at a constant rate, packets are buffered through ajitter queue within the MN and then dequeued and played at constantrate.

Classically, a jitter queue preferably buffers enough packets to accountfor the longest expected inter-arrival delay or jitter, thereby ensuringthat the play out (e.g., audio output information ultimately heard bythe user) does not starve once it has begun. When a play out doesstarve, the typical results are sound artifacts in the play out. Theideal low-latency audio jitter queue is considered herein as one wherethe buffer for the jitter queue always drains to zero at least once, butdoes not starve, in a predefined window of time. Satisfying thiscondition helps to guarantee that audio latency is not built up on thejitter queue, and this condition can be represented by the expression:JQ_(MIN)=0, during time T, where JQ_(MIN), represents the minimum numberof packets in the jitter queue during a time duration represented by T.

It is noted that a time duration T of one second or less is a preferablethreshold to be achieved for the jitter queue reaching zero in order topreserve a low-latency and high-quality audio experience. Other valuesfor the time duration T could also be selected, if desired.

If the jitter queue does not reach zero during the time duration T(e.g., JQ_(MIN)≠0, during time T), then a buildup of latency can bedeemed to be occurring as some packets will not be processed within thetime period T. To avoid this condition, the MN can discard packets fromthe jitter queue in one or more of the modes described in more detailbelow.

Further, if packets are discarded from the jitter queue in one intervalT, and then starves in a subsequent interval T_(i+1), this subsequentstarving can be used to indicate that the monitor time window T is notaligned with packet variances that are occurring in the interactivemusic system.

FIG. 7A is a representative diagram of an embodiment 700 for a jitterqueue that buffers audio frames for play output. The x-axis representstime, and the y-axis represents packets within the jitter queue. Thefirst time window (T1) included a spike in the number of packets that ispotentially limited by the jitter queue depth (e.g., the total number ofpackets that can be stored in the jitter queue). As described below, anyremaining packets within the jitter queue at the end of the time period(T1) can be discarded. During the second time window (T2), the portionof the diagram where low numbers of packets are within the jitter queueindicates where the jitter queue is close to being starved. At the endof time period (T2), the packets remaining in the jitter queue can againbe discarded. As described herein, an ideal time window is the one wherethe jitter queue reaches zero at least once with minimal starve anddiscard at the end of the time period. An example ideal window isindicated for embodiment 700.

As the bursty nature of jitter is considered to be statistically random,one can only strictly avoid this situation by increasing the window oftime T to a large value. Hover, this is not desirable because of thefollowing reason. If at the beginning of the window K packets weredelayed within the network and had not yet been received, the jitterqueue may starve. The play out buffer for the MN can be configured toplay filler audio frames during the starved mode until the late packetsarrive. If the late packets later arrive along with the rest ofsubsequent packets in a timely manner, the jitter queue will always haveK worth of extra packets on it and the user will perceive this latency.To avoid this situation, the time duration T can be bound and framesremaining within the jitter queue at the end of the time window T can bediscarded, if the jitter queue did not reach zero within the time windowT. The smaller the value of T initially, the more accurately thisindicates of low-latency playout. However, if the network is highlybursty, the system adaptively expands the window up to some threshold.If the network stabilizes after some time (indicated by low starves andhigh empty buffer counts), the system throttles down the windowduration. If the queue did not reach empty during the interval, thenremaining frames are discarded.

FIG. 7B is a block diagram of an example embodiment 750 for a jitterqueue. A frame buffer 752 receives input audio frames 754 and storesthese input frames. The stored frames (F1, F2 . . . FN) 760, 762 . . .764 are then output in a FIFO (first-in-first-out) order as audio frames756 unless discarded as discarded audio frames 758. The jitter queueframe controller 770 communicates with the frame buffer 752 to analyzethe stored frames (F1, F2 . . . FN) 760, 762 . . . 764 and to providecontrol information to the frame buffer 752 including discardinstructions. As described herein, the time window (T) can be used todetermine when discard determinations are made for the stored frames(F1, F2 . . . FN) 760, 762 . . . 764, and this time window (T) can bedynamically adjusted by the time window adjuster 776 based upon theconditions of the stored frames (F1, F2 . . . FN) 760, 762 . . . 764.The time window (T) is provided to the discard selector 772, and thediscard selector 772 generates discard instructions at the end of eachtime window (T). The discard instructions are provided from the jitterqueue frame controller 770 to the frame buffer 752. Based upon thediscard instructions, zero or one or more than one of the stored frames(F1, F2 . . . FN) 760, 762 . . . 764 are discarded as discarded audioframes 758 and not provided as output audio frames 756. As describedherein, the dynamic control of the jitter queue using the time window(T) and audio frame discards provides for reduced latency and improveduser experience.

One embodiment for a low-latency adaptive jitter queue algorithm isshown below. The adaptive algorithm runs when there are no lost packetswithin the network transmission, as by definition if packets are beinglost, the jitter queue will likely starve.

void jitter_end_of_window_process(jq_window t) { if( jq[t].had_starve( )&& jq[t−1].had_discard( )){ jq.EARLY_DISCARD_CNT.icrement( );if(jq.EARLY_DISCARD_CNT > DISCARD_THRESHOLD && jq.window_duration <MAX_JITTER_WINDOW) { jq.window_duration = jq.window_duration.increase(); } }else if( jq[t].had_starve( ) == false){ if(jq[t].had_no_packet_loss( ) == true && jq[t].min == 0){jq.WINDOW_IS_BALANCED.icrement( ); if(jq.WINDOW_IS_BALANCED.count()/jq.number_of_windows( ) > BALANCE_IS_GOOD_THRESHOLD){if(jq.window_duration < MIN_JITTER_WINDOW) { jq.window_duration =jq.window_duration.decrease( ); } } } if(jq[t].had_no_packet_loss( ) ==true && jq[t].min != 0){ if(jq[t].discard_policy == CLAMP_TO_ZERO){jq[t+1].schedule_discards = jq[t].current_length( ); } else if(jq[t].discard_policy == CLAMP_TO_MIN){ jq[t+1].schedule_discards =jq[t].min; } } } } void packet_discard(jq_window t, audioPacket p) {if(jq.schedule_discards > 0){ if( can_discard_packet(t,p)){jq[t].discard.increment( );   jq.schedule_discards.decrement( ); } }bool can_discard(jq_window t, audioPacket p) { if(p.audioEnergy <= QUIET&& jq[t].playoutSequenceIsQuiet( )) return true; if(p.audioEnergy >=LOUD && jq[t].playoutSequenceIsLoud( )) return true;if(jq[t].packetsTobeRecievedInWindow( ) <= jq[t].schedule_discardsi)return true; return false; }

Low-Latency Jitter Queue Discard Policy.

The example algorithm above dynamically expands and shortens the jitterqueue monitoring window (T) to find a window where the count of thenumber of times the jitter queue reaches a minimum of zero with the timewindow T (e.g., JQ_(MIN)=0, during time T) occurs at high rate, such asfor example preferably at least 50% or greater of the play outinput/output rate. The can_discard( ) function within the algorithmapplies heuristics to decide if an audio packet is a good candidate fordiscarding. The can_discard( ) function is called when the algorithmdetermines that audio latency is building up on the queue and packetsmust be discarded. The example heuristics used are described below withrespect to different discard heuristics: energy based discard, randomdistribution discard, linear discard, lump discard, and hybrid discard.Different and/or additional heuristics could also be utilized.

Energy Based Discard.

The sender of the audio frame also includes additional data indicatingthe power level, such a VU (volume unit) level, of the energy of theaudio encoded in the frame. The receiver then can use this energy levelto decide before decoding the frame, if this is a relatively silent orloud frame. If the frame is in a sequence of quiet or loud frames, it isa candidate for discard and the system can either discard the framewithout decoding (treating it as lost packet) or decode the frame anddiscard the data. The latter approach is preferred as the audio decoderis stateful and this leads to the best preservation of sound. However,it may be more efficient to the receiver computational capability tosimply discard the packet and let the decoder recover its state bytreating the discard packet as lost.

Random Distribution Discard.

If K packets are expected to be received within the time window T and Dpackets are to be discarded within the time window, a random numbergenerator of range K can be used, and packets can be discarded when therandom number generator produces a number “i” such that i/K is less thanor equal to D/K. As such, for the K packets received within the timewindow T, D of these K packets will be randomly discarded based upon theoutput of the random number generator.

Linear Discard.

If K packets are expected to be received within the time window T and Dpackets are to be discarded within the time window, a linear discard canbe used such that packets are discarded using a ratio of D/K packets. Assuch, for the K packets received within the time window T, a packet isdiscarded every D/K packets rounded down to the nearest integer.

Lump Discard.

If K packets are expected to be received within the time window T and Dpackets are to be discarded within the time window, a lump discard canbe used such that D consecutive packets are discarded at once. As such,for the K packets received within the time window T, a consecutive groupof D packets within the time window T are discarded together.

Hybrid Discard.

If K packets are expected to be received within the time window T and Dpackets are to be discarded within the time window, one or more of theabove discard techniques, as well as other discard techniques, could beused in combination. For example, the energy based discard can be usedin conjunction with one of the other discard methods. If the energybased discard and the lump discard methods were utilized, for example,the energy based discard could first be applied and if it has not foundcandidate packets at the appropriate relative levels to discard and thetime window is coming to a close, then the lump discard could be used todiscard D packets in a lump discard.

Mismatch Sender/Receiver Packet Rates.

Let C be the audio capture rate at a MN input and P be audio output playout rate. If two nodes MN, and MN, are in a session and C_(i)≠Pj orC_(j)≠P_(i), then the jitter queue at the receiver portions of these MNswill buildup latency or starve, respectively. If it is assumed thatC_(i)>Pj and because the input/output (IO) rate for a particular MN canbe assumed to generally be matched, then it can also be assumed thatP_(i)>Cj. These assumptions mean that MN, will be sending more frames toMN, than it can play out thereby causing latency buildup in the receiverportion of MN_(j). These assumptions also mean that MN_(j) will not sendenough frames to MN_(i) causing the receive portion of MN_(i) to starve.

This situation is likely to occur because the IO subsystem of the MNsinvolved in session may not all be matched. To gracefully handle this IOmismatch, the MNs share their IO rate information with other MNs withinthe session, thereby enabling them to understand whether, and how many,frame discard/insert operations they may need to execute per second inthe audio path from each sending MN to each receiving MN. By knowingthat frame insert is needed with respect to an audio path, the sendingand/or receiving MN can intelligently choose the point to insert one ormore audio frames, such as during quiet or loud audio sequences asdescribed above. Similarly, by knowing that frame discard is needed withrespect to an audio path, the sending MN or receiving MN canintelligently choose the point to discard one or more audio frames, suchas during quiet or loud audio sequences as described above. It isfurther noted that the MN in an audio path that has the faster IO rateis preferably the MN to execute the discard/insert operations, as thisMN would likely have greater processing capacity. However, either MN orboth MNs within the audio path can execute discard/insert operations, ifdesired.

Sender Queues and Rate Adjustments for Receivers.

It is desirable not to have the receiving MN starve of input audiopackets or discard audio packets. For example, if the encoded audiostream process is stateful, these starve conditions and/or discardconditions can cause the MN to loose state and produce undesirable audioartifacts. To help ensure these starve and/or discard conditions do notoccur at the receiving MNs, each receiving MN can be configured toinform each of the sending peer MNs what its IO rate is for processingreceived audio packets. For each receiving MN to which it is sendingaudio packets, the sending MN can then implements different send queueshaving different send rates, each queue being tuned to the receiving MNexpected JO rate for processing input audio packets. Input audiocaptured at the sending MN is then queued within respective send queues,and these send queues are set to have JO rates associated with thereceiving MNs. The send queues can be implemented, for example, usingdecimator/interpolator blocks within the audio output paths for thesending MN to produce audio content that matches receiver JO rates. Forexample, decimators can decimate the audio content to reduce the outputaudio rate, and interpolators can extend the audio content to increasethe output audio rate. The decimated/interpolated audio is encoded,packetized, and sent by the sending MN to the respective receiving MNs.

FIG. 7C is block diagram of an example embodiment 770 for sending MNshaving sending queues including decimator/interpolator blocks andencoder/packetizer blocks to adjust send rates for receiving MNs. Asdepicted, MNA 112 is sending input audio captured at MNA 112 to MNB 114,MNC 116, and MND 118 through network 110. MNA includes adecimator/interpolator for each MN to which it is sending audio packets.Each decimator/interpolator decimates the audio content or extends theaudio content based upon IO rate information received from each of theother MNs. For example, MNB 114 communicates with MNA to provideinformation about the IO rate associated with its processing of receivedaudio packets through its decoder/jitter buffer. Similarly, MNC 116 andMND 118 communicate with MNA to provide information about the respectiveIO rates associated with their processing of received audio packetsthrough their decoders/jitter buffers. Using this IO rate information,MNA adjusts the decimator/interpolator for the receiving MN to accountfor the expected IO rate for that receiving MN. The output from the eachdecimator/interpolator is then provided to an encoder/packetizer thatencodes the audio data and packetizes it for transmission as audiopackets through the network 110. The send rates to each of the peer MNsare therefore tuned for each of the receiving MNs, as represented by thedashed line 114 to MNB 114, the dashed and dotted line to MNC 116, andthe solid line to MND 118. Each of the other MNs 114, 116, and 118 canoperate in a similar way as MNA 112 to provide tuned send rates to eachof the other peer MNs within the music session. Further, the MNs canperiodically send updated JO rate information to the other MNs duringthe music session so that the respective send rates from the other MNsto that MN can be updated during the music session. As such, the userexperience is improved, as discard and/or starve conditions at thejitter buffers can be reduced and potentially eliminated through the useof sender queues and rate adjustments.

Recording

Writing the digital content of an audio stream to a file is referred toherein as recording. In a music session, any user may initiate arecording from a participating MN control interface, such as for examplethrough the control window 310 depicted in FIG. 3D.

The record start command is sent to all the MNs in the session, and eachMN records the following: (1) audio input at each MN(R_(ai)), (2)incoming audio stream from each peer MN(R_(as)), and (3) master output.The audio input(s) at each MN(R_(ai)) is typically the highest fidelityaudio source as it has no encode/decode compression or transmissionrelated artifacts such as packet loss, errors, and/or otherdegradations. The incoming audio stream from each peer MN(R_(as)) is arecording of what each user is hearing at their respective MN. Theincoming audio stream from other MNs is received as the decoded versionof the encoded stream sent by the original peer MN and includes all theartifacts from packet loss, errors, jitter queue discards/inserts,and/or other degradations. The master output is the mix (R_(m)) of audioinput at a MN and the remote input streams, this mix is played out atthe MN such that R_(m)=ΣR_(as)+ΣR_(ai).

Fast Record Playback.

Each MN produces a set of recordings (R_(m), R_(as), R_(ai)) includingthe local recordings, the peer MN input recordings, and the masterrecording from a record command. At the record stop command, this set offiles is available for immediate playback. These files represent thefast playback assets from recordings at an MN.

High Fidelity Playback.

Each MN in the session also uploads the high fidelity local inputrecording (R_(ai)) to the server. The server stores and distributesthese high fidelity recordings to each of the MNs in the session. As thehigh fidelity recording (R_(ai)) corresponding each peer input recording(R_(as)) is downloaded to a MN, the MN replaces the content of the lowerfidelity file with the high fidelity source recording file (e.g., eachR_(ai) replaces its respective R_(as) at each MN once received). At suchtime, the user at the MN may playback the session high fidelity audioeither locally or from the server that mixes the audio of the highquality recordings. These high fidelity files represent the slowplayback assets from the recordings at the MNs in the session owing tothe delay in getting audio pushed to the server and then downloaded tothe MNs within the session. It is also noted that the MNs can also keepthe low fidelity recordings (M_(as)), if desired, even though thecorresponding high fidelity recordings (M_(ai)) have been downloaded tothe MN. Further, it is noted that each MN can send its local highfidelity recording (M_(ai)) directly to the other MNs in the sessionrather than going through the server.

FIG. 8A is a swim lane diagram of an example embodiment 800 for sessionaudio communications for three MNs (MNA, MNB, MNC) and recording serviceincluding one or more server system(s). Once MNA, MNB, and MNC havesigned on to a music session, they stream audio for their music tracksto each other as part of the music session. Any one of the MN users canthen initiate a start for a recording. As depicted, MNA initiates astart for a recording. Each MN then records its local tracks and theother MN tracks as described herein. Any user can then initiate a stopof the recording. The high fidelity recordings made at each MN are thenuploaded to the server. The MNs can then download the high fidelityrecordings for the other MNs in the session from server. Once these aredownloaded to each MN, the MN notifies the user that high-quality orhigh-fidelity playback is available for the session recording. It isalso noted that the high-fidelity recordings could be directlycommunicated between the MNs in the session, if desired.

FIG. 8B is a block diagram of an example embodiment 820 for a recordingsystem. The embodiment 820 includes one or more input channel processors(ICP) that process local audio inputs or loopback/peer audio inputs fromnetwork connections 825. The group ICP 821 captures audio inputs fromone or more instrument inputs (e.g., guitar, keyboard, voice, etc.) andoutputs transmit audio packets associated with this audio input. GroupICP 821 also provides high quality audio outputs 831 and 832 associatedwith the captured audio inputs for the music session. The group chat ICP822 captures one or more chat audio inputs and outputs transmit audiopackets associated with this audio input. The peer ICPs 826 and 827receive de-multiplexed music session audio input packets from peer MNsand process those packets to produce low quality recording user audiostreams 834 and 835. The ICPs 828 and 829 receive de-multiplexed chataudio information and can output chat audio. The audio controller 830provides speaker output 833 and provides a monitor and master mixercontrols, as well as main and monitor speaker control and volumecontrol. It is noted that each of the outputs 831, 832, 833, 834 and 835are example audio output streams that can be selected for recordingindividually and/or in combination with each other.

FIG. 8C is a block diagram of an example embodiment 840 for a recordingsystem and related recording service where sessions are stored by aserver and by MNs. Each MN initially stores high quality recordings forits local tracks and low quality recordings for the tracks from theother MNs in the music session. The high quality recordings are thenuploaded by the MNs to the server and stored by the server. These highquality recordings can then be downloaded to the MNs to replace theinitial low quality recordings made for the tracks from the other MNs.Once these high quality recordings are downloaded to an MN, the MN willhave high quality recordings for each track in the music session. Thehigh quality and/or low quality recordings can be played back by an MNindividually or in combination by a user of the MN. Until the highquality recordings are downloaded, playback uses the high qualityrecordings from the local MN tracks and the low quality recordings fromthe peer MN tracks. Once the high quality recording are downloaded, theentire session recording can be played back at the MN using the highquality recordings.

Auto Mixing of Recording via Latency Compensation.

When the command to start a recording is initiated, there is a delay ofat least the network delay between the sender and receiver before therecording command is actually started. Assume the initiating MN_(A) issending the record start command to MN_(B) and MN_(C), there are recordstart time delays (e.g., network delay plus processing delay) betweenMNA and MNB represented as t_(AB) and between MN_(A) and MN_(C)represented as t_(AC). Whereas the set of recordings (R_(m), R_(as),R_(ai)) started at MN_(A) are synchronized with each other, the starttime of the high fidelity recording at MN_(B) and MN_(C), namely RA_(ai)and RB_(ai) will have different start times of at least the delayst_(AB) and t_(AC), respectively. Without accounting for this delay, afinal cut recording (e.g., R_(FINAL)=ΣRA_(ai)+ΣRB_(ai)+ΣRC_(ai)) willproduce music that is time skewed.

It is noted that mixing of audio is represented herein using thesummation symbol: “Σ”. As one example, this audio mixing can be anaverage of the sum of the audio signals that have been normalized togiven range, for example, ±1.0 floating point values, or 16-bit integer,or 32-bit integer, or some other selected range. Audio mixing could alsobe implemented using additional and/or different techniques, as desired.

Recording the network delay between MN_(A) (e.g., the record startinitiator) and its peers MNB and MN_(C) is a good first orderapproximation of the amount of time skew that is needed to bring therecording in synchronization. However, the processing delay is notaccounted for in this model.

Reference Clock Synchronization.

An accurate reference clock common to all MNs in the session andtimestamps made at each MN at recording stars can be utilized to helpprovide this synchronization. Each MN uses the common reference clock totimestamp each recording start with that clock time. With this referenceclock timestamp, the following example algorithm can then be used toproduce final mix:

-   -   1. Sort the high fidelity recordings (RA_(ai), RB_(ai), RC_(ai))        by timestamp    -   2. The oldest timestamp represent the recording that started        latest (t_(OLD))    -   3. For each recording R_(ai), the delay (t_(Di)) relative to the        latest start time is represented as t_(Di)=t_(OLD)−t_(STARTi)        where t_(STARTi) is the record start time for R_(ai).    -   4. The delay (t_(Di)) is the time offset in recording R_(ai)        that must be skipped to bring the recording in alignment with        that of the recording having the latest start.    -   5. R_(FINAL) is then produced by discarding the delay (t_(Di))        worth of data associated with each recording with the set of        recordings (RA_(ai), RB_(ai), RC_(ai)) that does not have the        latest start time, and then reading and mixing audio from the        files from a time that will now match the latest start time        t_(OLD). When the first end-of-file is reached, the mixing        process stops.

This common clock synchronization process enables auto generation of thefinal cut (R_(FINAL)). The MNs can also be allowed to manually calibratethe time offset, if desired.

As indicated above, the clock synchronization algorithm depends on thepresence of a reference clock common to the MNs in the session. Onemethod for implementing this is to use a distributed clock algorithmaugmented with an algorithm to select a master node in the session. Assuch, each MN then runs a local reference clock that is calibrated tothe elected master clock. The elected master clock then effectivelyserves as a time server. The music server can also provide a masterclock and be used as the master node by the MNs for clocksynchronization.

One technique that can be used to provide a common distributed referenceclock for the MNs is through the use of the well known Cristian'sAlgorithm described in the article: Cristian, F., Probalistic ClockSynchronization, Distributed Computing, (3):146-158 (1989). As oneexample, this technique works between a process (P) and a time server(S), such as a time server available through the internet. The processrequests the time from the time server. After receiving the request fromprocess, the server prepares a response and appends the time (T) fromits own clock. The process then sets its time to be the server time (T)plus half if the round-trip-time (RTT) for the communication. Thistechnique assumes that RTT is split equally between the request time andthe response time. Multiple requests can also be made by the process tothe server to gain more accuracy, for example, by using the responsewith the shortest RTT. The process can determine RTT, for example, bythe difference in its local time between when it sends its request tothe time server and when it receives the response from the server. Othervariations and techniques could also be utilized.

Distributed Metronome

A metronome helps musicians keep playing in time, or in sync. In adistributed music session, the delay incurred if a single metronome wereused makes such an option range from undesirable to impractical. Even ifmultiple metronomes are used, the skew in start times will cause them tobe naturally out of sync as illustrated in FIG. 9A.

FIG. 9A is a signal diagram showing metronome pulses associated withthree different local metronomes that are based upon a single metronomepulse. Without the distributed metronome techniques described herein,each local metronome pulse will be offset based upon a different delay(do, d1, d2) associated with that local music node.

A distributed metronome is therefore implemented to provide a localmetronome at each respective location for the MNs in a session that issynchronized to a common reference clock in the session and that playsin synchronization with this common reference clock irrespective of thedelay between the MNs. As such, the MN user hears only the output of themetronome from his/her own MN and not from any other metronome at theother MNs. Using the distributed metronome described herein, the starttimes are aligned as shown in FIG. 9B.

FIG. 9B is a signal diagram showing metronome pulses associated withthree different local metronomes that have been synchronized. With thedistributed metronome techniques described herein, the delay offsets(d0, d1, d2) associated with the local music nodes are aligned in timebased upon a start time (T_(start)).

For the purposes of recording timestamp as described above, the MNs in asession already have a reference clock system that can be used for thedistributed metronome. While creating a metronome using a processingdevice running software instructions has been done previously, theproblem associated with the interactive music systems described hereinis how to ensure that when one MN user within a session starts orchanges the setting of their metronome, all other metronomes for the MNsin the sessions will also start or be changed in synchronization. Once alocal metronome is started at an MN, it is assumed that the clocks atthe MN are accurate enough such that the MN plays the correct BPM (beatsper minute) requested by the user. Further, each MN can be set atdifferent BPM, if desired. The following describes an example processthat can be used for the distributed metronome:

-   -   1. Each MN knows the network latency between it and every MN in        the session, as described above, and the maximum latency        (t_(MAX)) for its peer-to-peer connections can be determined        from these latencies.    -   2. Let the reference clock time for the MN at which the        metronome start is initiated be represented by t_(REF). The        initiating MN broadcasts a “metronome start” command to all peer        MNs within the session indicating that the start time for the        metronome is to be t_(START)=t_(REF)+2t_(MAX). Twice the maximum        latency (2t_(MAX)) is used as a conservative approach, although        a lower start time bound of t_(START)=t_(REF)+t_(MAX) could also        be used, as well as other later start times.    -   3. A MN receiving the metronome start command waits until its        reference clock time (t) is about the designated start time        (e.g., t≅t_(START)). The accuracy of local clocks are typically        on the order of ±1 ms. If the designated start time (t_(START))        is earlier than the current reference clock time (t) for the MN        receiving the start commend (e.g., t_(START)<t), then the        command is late and the receiving MN re-broadcasts a new start        time with an increase to the 2× multiplier for its maximum        latency (t_(MAX)) to compensate for unexpected lateness of the        command.    -   4. Every minute each MN rolls over and starts a new count off of        metronome ticks. As such, the start time is important for the        MNs to remain in sync.    -   5. If a user changes the BPM at his/her MN, a restart of the        distributed metronome is broadcasted through a new “metronome        start” command. This restart helps to ensure synchronization        between the MNs in the session after BPM changes.

It is noted that audio from the metronome is preferably played only tothe local MN output. Further control is also provided at each MN toallow a user to determine whether the local metronome output is heard inone or both ears, for example, if headphones are being used. Further,metronome audio is also not recorded by default, although the MN can beset to record the metronome audio as well, if desired.

Interactive Virtual Positioning within Music Session

Musicians performing at given location (e.g., stage) receive sound in afully immersive sense. Their sense of presence comes from the directionof the sound, based on their relative position to each other and theacoustic properties of the location. The interactive virtual positioningembodiments described herein enable a reproduction of this immersive andpresence experience by utilizing a number of existing technologies thatare augmented as part of the interactive music system.

FIG. 10A is a diagram 1000 of sound location perception by a personhearing sounds from two sources (S1, S2). A first source (S1) isreceived at different times at two points (Y1, Y2) on a person's headbased upon different travel distances (H11, H21) for the sound.Similarly, a second source (S2) is received at different times at thetwo points (Y1, Y2) on the person's head based upon different traveldistances (H12, H22). Sound location perception of a person is basedupon differences between sound paths striking the head and being sensedby the person.

Using this sound location perception, a three dimension definition of avirtual environment is generated for the session. Each MN, sound source,or other element within the session can be placed at specific positionswithin this virtual space. Based on the instrument type selected by auser, the user is provided with a set of pre-defined configurations,such sitting violinist, or standing violinist. If the MN has multipleinputs, the system allows the user to indicate how those inputs arepositioned within the virtual space. For example, a keyboardist coulduse one input for positioning the keyboard instrument within the virtualspace and one input for positioning the keyboardist's voice within thevirtual space.

FIG. 10B is a diagram 1010 of an example locations or positions (P) formusic session elements within a virtual space. Each of the hexagonsrepresent the position (P1, P2, P3, P4, P5, P6, P7) of an element, suchas an MN, within the session. Each position will have a unique soundexperience. For example, the perception at position P2 of soundgenerated from position P1 and position P3, as indicated by the arrows,will be different from the perception other positions, such as positionP6, for this same sound. A virtual microphone array associated with eachposition, such as position P2, can be used to determine sound receivedat that position.

For each location or position, a head-related-transfer function (HRTF)is assigned by the user virtual position. Because the geometry of thevirtual room is known and relative position of the sound sources havewell defined three-dimensional (3D) coordinates, the HRTF can be used tocompute the perception of sound presence that a user in that positionwould hear. Each position P represents a MN input and any other physicalattribute of the source that is helpful to characterize thedirectionality of the sound that input produces (e.g., its sound field).

FIG. 10C is a diagram 1020 of an example dummy head 1022 that isdepicted to a user and can be adjusted by the user to place and orientthe user within the virtual environment for the music session. Basedupon the position of the dummy head 1022, the dummy head 1022 willreceive audio signals from other elements within the music session.These audio signals are then packetized for transmission or storage, asindicated by block 1024 and as described herein. The resulting audio canthen be output to a listener as represented by head 1026.

The user at a MN is allowed to select their desired virtual positionthrough manipulation of a dummy head representation in the virtual spaceor setting for the music session. This positional data is also sent toand shared with other MNs within the session. The user may also chooseto upload their HRTF specific data or to select from a set of genericpre-configured profiles to upload.

MTB (Motion Tracked Binaural) System.

By emulating a virtual microphone array and using a head-tracker, amotion tracked binaural (MTB) system can be provided to each virtualmusician/listener in a session. A MTB system can be used to produce themost natural and immersive sense of presence for the musician/listener.

FIG. 10D is a diagram 1030 of an example dummy head 1032 that includes avirtual microphone array of two or more microphones. This dummy head1032 can also be depicted to a user and can be adjusted by the user toplace and orient the user within the virtual environment for the musicsession. Based upon the position of the dummy head 1032, the microphonearray related to the dummy head 1032 will receive audio signals fromother elements within the music session. These audio signals are thenpacketized for transmission or storage, as indicated by block 1034 andas described herein. The resulting audio is output to an interpolator1040, which then outputs to a listener as represented by head 1036.However, the listener can also have a head tracker 1038 worn, mounted orotherwise attached to the listener's head 1036 that tracks movements ofthe head 1036. The tracked movements are provided back to theinterpolator 1040. The interpolator 1040 uses these tracked movements toadjust the output sound so that the listener's perception is that thelistener is moving his/her head position within the virtual environmentfor the music session. As such, a virtual reality experience is providedfor the listener within the virtual sound field for the performancewithin the music session.

The MTB system depicted in FIG. 10D, therefore, correlates the usershead position with the head-position in the virtual space. Whereas aphysical microphone array is used in typical physical setting, an actualmicrophone array is not needed for the embodiments described herein asthe each user directly controls the movement of his/her virtual head inthe virtual space defined for the music session.

The MTB system can provide a variety of features. For example, a virtualspace definition can provided that models the acoustic properties of avirtual environment within which the music session is to virtually takeplace. A two-dimensional (2D) and/or three-dimensional (3D) graphicalvirtual position selection and placement mechanism of musician avatarscan also be provided through each MN in the session. The user can alsobe allowed to adjust attributes of an avatar representing the user,including adjustments to height, number of microphones (e.g., soundsources), relative position of each microphone, and/or other desiredattributes. A set of preconfigured musician attributes is also provided(e.g., drummer, pianist, guitarist, and/or other musician) and can beselected by the user. Further, once a performer/listener is positionedand assigned within the virtual space, the performer/listener may electto listen to the session from another virtual position (e.g.,out-of-body experience) within the virtual space. This virtualpositioning is useful to understand the sound a virtual user at thatlocation in the virtual environment will receive. The system alsoremembers and uses the HRTF data set or selected by a user, and thisHTRF data is used in whatever virtual location the user selects.

The performer/listener position also provides a positional informationfor the source for the audio in the virtual space. An acoustic processorfor each MN can then use this data along with the VU (volume unit) levelinformation to compute a direction and volume received at anotherposition within the virtual space. The acoustic processor can alsocompute reflections and any emulated ambient noise (e.g. crowd noise) aswell as other sound effects, as desired, and mix these effects into theaudio heard by the user at the MN.

As part of the user interface, a user is allowed to select the HRTF thatbest approximates their physical and auditory characteristics and/or anyother desired HRTF. This user selection can be provided through agraphical menu selection or by asking the user for some basicmeasurement information of his/her physical features (e.g., head size,ear positioning, etc.). Alternatively, the user can be giveninstructions on how to determine physical measurements (e.g., taking andprocessing pictures of themselves) so that their physical dimensions canbe obtained. Also, if a user has his/her HRTF measurements takenprofessionally or these HRTF measurements are otherwise determined,these HRTF data can be uploaded to MN or to the session server describedherein. The server can be store this data and send it to the acousticprocessor for the user when the user is listening in 3D mode.

Concert Broadcast Modes

The live music produced in a music session may be broadcasted. Thefollowing modes of broadcast can be utilized within the interactivemusic system embodiments: low latency live broadcast, high fidelity livebroadcast, 3D virtual reality broadcast, 3D concert podcast, and/orother broadcast modes.

Low Latency Live Broadcast.

In this broadcast mode, the server system operates as a broadcast serverand assigns one of the MNs in the session to serve as a broadcast streamprovider. The assigned MN encodes the output audio for the broadcast andsends it to the broadcast server. The output audio encoded at the MNselected as the stream provider is a mix of the incoming peer streamsfrom the other MNs in the session and its local audio input. As the peeraudio streams are transmitted and processed with low-latency asdescribed above, the audio recovered from those streams may have theeffects of packet loss, jitter queue starve/overflow artifacts, and/orother artifacts. As such, the low latency broadcast stream will alsocarry these artifacts, but will also be a relatively “instantaneous”representation of the live event being performed within the musicsession.

FIG. 11A is a block diagram of an example embodiment 1100 for a lowlatency live broadcast (e.g., low-latency concert broadcast mode). At anMN, the local audio inputs captured by an instrument ICP and the peeraudio packets received through the network are mixed together using amusic mixer. The mixer output is provided as a speaker output for the MNand is also provided to an encoder for output to the network as a livebroadcast. The server operates as a broadcast server and makes the livebroadcast available for streaming through the network to one or morebroadcast clients.

High Fidelity Live Broadcast.

In this broadcast mode, the input audio at each MN is encoded,packetized and transmitted via a reliable network protocol, such as TCP(transmission control protocol) to the broadcast server. Each audiopacket is also configured to carry a timestamp of the sessionreference/master clock. In the server, the audio frames are recovered,and the timestamps are used to synchronize the audio frames. Thesynchronized audio are then processed through a server audio mixer, andthe resulting audio is encoded and broadcasted. The server audio mixercould be a full function digital audio workstation (DAW), which canprocess the streams in a variety of ways, such as by adding audioeffects, adding other audio tracks, and/or otherwise processing thestreams. This cloud-based DAW can also be provided as a paid servicethat users may lease. The high fidelity streams can also be sent to aseparate user-specified server that controls the mixing process andproduces the audio stream to be broadcasted.

FIG. 11B is a block diagram of an example embodiment 1120 for a highfidelity live broadcast mode (e.g., high-quality concert broadcastmode). The high quality audio inputs captured at each MN are uploadedthrough the network to the server. The server decodes the audio framesfrom each MN with a frame decoder and mixes the audio frames together.Timestamps are added to the audio frames at each MN using a referenceclock, and the server uses these timestamps to align the audio framesfrom each MN for purposes of mixing the audio frame together. An encoderreceives the mixed output and generates an audio stream output that ishigh quality. The server then operates as a broadcast server to makethis high quality live broadcast available for streaming through thenetwork to one or more broadcast clients.

3D Virtual Reality Broadcast.

As described earlier, the system provides an interface where a virtualspace is defined and the musicians are assigned or select positionswithin the virtual space. This virtual positioning can also be providedto users to allow the “purchase” of specific seats or locations in thevirtual space for the performance. For example, a user can be allowed toselect a position from which he/she would like to listen to the event.As describe above, a binaural processor is embedded in the listenapplication and the user provides or selects their HRTF data.Additionally, the user may use a MTB system that provides head trackingand therefore provides the ability to have an even more realisticexperience. The high fidelity tracks may be relayed directly to thelistener device for acoustic processing, or the acoustic processorinstance may be a service on a server. The acoustic processor uses theHRTF and motion tracking data to produce a final stereo mix that thisspecific to that user.

It is noted that the performers default position is what the sessioncreator defines when the session is created. However, a listener isallowed the ability to “move” them in the virtual space. This movementprovides a more personal experience to the user. A listener can also beassigned a fixed seat in the audience or can be free to “move” around.For example, a user who hears better from one ear than another may electto be on a particular side of the virtual space for the performance. Theconcert environment may also be fixed by session creator, or the usermay be allowed to change the concert locale or environment (e.g., changefrom Carnegie Hall to Madison Square Gardens).

3D Concert Replay or Podcast.

The high fidelity tracks generated through the processes described abovecan be stored and replayed. As such, a user may have a 3D concertexperience at any time through the stored audio tracks. For example, thestored 3D concert can be made available as a podcast that can bedownloaded to a device, such as a tablet or phone, and replayed.

Large Group Music Session

In a purely P2P music session, the number of audio streams growslinearly with number of participating MNs. In part, this linear growthhas three effects: (1) the bandwidth requirement grows linearly as thenumber of peer-to-peer MNs grow within the session, (2) at each MN thenumber of audio decoder instances and the compute power requirementgrows linearly, and (3) the user interface can become cluttered withlarge numbers of MNs.

To enable large groups (e.g., choirs, bands, orchestras, big bands, andother large musical groups) to interact in a music session with gooduser experience, this following process can be used to enhance the userexperience:

-   -   1. Each MN in the session determines a latency score with all        other MNs in the session.    -   2. Each MN is tagged with a color representing the role the node        will play in the session (e.g., red for violins, blue for        trumpets, etc.)    -   3. The system sorts MNs in the session into groups based upon        common parameters (e.g., color, latency, etc.). Let G_(i)        represent the i^(th) group.    -   4. Intra-group audio, which is audio for MNs in the same group,        flow as normal such that each MN peer sends audio packets to        every other MN peer in the group, directly or via a proxy        server.    -   5. Inter-group audio, however, is configured to flow in such a        manner that cycles are not created. This cycle free flow is        controlled by using a spanning tree algorithm to create a cycle        free communication tree between the groups.    -   6. One MN in each group is used to communicate with another        group. The pair of MNs that serves the role of connecting        adjacent group A with group B in the spanning tree are        preferably selected based on the minimum latency between nodes        in the groups. FIG. 12A described below illustrates this wherein        MN2 in Group A and MN4 in Group B have been determined to have        the lowest latency of all node-to-node connections between MNs        in Group A and MNs in Group B after those connections have been        probed.    -   7. The system max latency (S) is the highest audio latency. The        system max latency (S) can be determined, for example, by        performing an exhaustive breath first search from the MNs in the        group session, and summing the inter-group link latency. If the        maximum allowed latency in the interactive music system is        T_(MAX), then grouping of nodes is considered non-optimal if        S>T_(MAX). If S≦T_(MAX), the grouping of nodes is accepted and        can further be considered a final solution.    -   8. When S>T_(MAX), then the system attempts to reduce latency by        adjusting the groupings. For example, the color grouping        constraint can be removed, and the system can place MNs in        groups until the system finds a grouping that meets the desired        latency threshold (e.g., S≦T_(MAX)). Many algorithms can be        employed for achieving this type of graph analysis to determine        if solutions is possible. Because the number of nodes in the        group session will typically be relatively small (e.g., tens of        MNs), the computation processing needed to search for and/or        solve for a grouping solution is not prohibitively expensive to        obtain or provide.

FIG. 12A is a diagram of an example embodiment 1200 for MNs within twogroups selected as bridges for inter-group communication. For theembodiment 1200, a first group (GROUP A) 1202 includes two music nodes(MN1, MN2) 1204/1206, and a second group (GROUP B) 1212 includes twoadditional music nodes (MN3, MN4) 1214/1216. MN1 1204 and MN2 1206communicate with each other as part of GROUP A 1202, and MN4 124 and MN31216 communicate with each other as part of GROUP B 1212. MN2 1206 isthe bridge for GROUP A and communicates with MN4 1214, which is thebridge for GROUP B.

FIG. 12B is a diagram of an example embodiment 1250 for inter-groupcommunications within a larger interconnected group (e.g., IMN clustersfor a large group). For the embodiment depicted, four groups (GROUP A,GROUP B, GROUP C, GROUP D) are interconnected through clouds. Further,within each group, the interactive music nodes (IMNs) are alsointerconnected through clouds. It is also noted that the cloudsrepresent one or more networks, such as network 110, through whichnetwork communications can occur.

The MNs that serve as bridge between groups are configured to performadditional functions. The incoming audio stream from peer MNs in thegroup (R_(as)) are decoded and mixed together by the bridge MN to form agroup audio stream (R_(g)) such that R_(g)=ΣR_(as). The bridge MN isthen responsible for sending this mix to the other group with respect towhich it is acting as a bridge. The bridge MN must also send its owninput audio I=ΣR_(ai) to two paths, namely to its intra-group MNs and tothe bridge MN with the other group for which it is acting as a bridge.

MN2 in Group A and MN4 in Group B are described above as bridge MNs. Thestreams leaving MN2 from Group A to Group B through MN4 in Group B isrepresented as S_((A2, B4))=I_(A2)+R_(gA). Similarly, MN4 in Group Bsends audio to Group A through MN2 in Group A, and this audio isrepresented as S_((B4, A2))=I_(B4)+R_(gB).

If the bridge node sends the audio input and intra-group audio asdistinct audio frames (e.g., frames containing I_(A2), and framescontaining R_(gA)), the receiving bridge MN can differentiate what isfrom the bridge MN and what is from the other MNs in the group. If thebridge node produces a final mix so that it sends only that mix audio(e.g., frames containing S_((A2,B4))), the receiver bridge MN is unableto distinguish and therefore control mix of bridge node audio separatelyfrom its intra-group audio.

A bridge node also performs the role of receiving the audio from itspeer bridge node and relaying that audio to its intra-group peers. Sothe audio output by bridge MN2 in Group A to its peers in Group A can berepresented as G_(A2)=ΣR_(Ai)+S_((B4, A2)) where ΣR_(Ai) is the set ofinputs at A2. Similarly, bridge MN4 in Group relays audio from its peerbridge node along with its inputs to the peers in Group B as representedby G_(B4)=ΣR_(Bi)+S_((A2, B4)) where ΣR_(Bi) is the set of inputs at B4.

High Latency Inter-Group Bridge.

If A2 decodes S_((B4, A2)) and then mixes it with its inputs, it willprocess these packets through a jitter queue. The involvement of thejitter queue implicitly connotes a higher latency than if the packetswere not decoded and mixed. However, doing this mixing will result insingle stream of audio packets coming from A2 to its intra-group peers.This results in a lower bandwidth than sending distinct packets. Thepeers also will not be able to distinguish A2 input audio from thatwhich came from the other group for which A2 is a bridge.

Inter-Group Cut-Through Mode.

Rather than decode and mix the audio from the group stream, A2 maysimply relay the packets to its group members. It may also aggregate itssending payload with payload of packets received in the inter-groupstream. This operation does not require the S_((B4, A2)) packets to theprocessed through a jitter queue and is therefore a lower latencyoperation. In this mode, the audio frames for inputs to A2 remaindistinct from those of the relayed group for which A2 is a bridge. Assuch, the intra-group peer MNs can represent and control the mix ofthese streams distinctly. This mode is a higher bandwidth than thehigh-latency relay mode.

A similar analysis may be done for group B and node B4. The followingcan be concluded:

-   -   1. The outgoing inter-group peer stream mix, namely S_((A2,B4))        and S_((B4, A2)) from a bridge nodes A2 and B4 respectively, is        produced from mixing the intra-group streams received at those        nodes. Because these streams are processed through jitter        queues, the output streams experience latency. It is also noted        that there is no point in doing cut-through of these frames        because cut-through would simply collapse the notion of groups.    -   2. The relay of inter-group audio to peers intra-group may incur        no delay at the bridge node, if cut-through mode is use. If not,        the stream incurs jitter queue processing delay.

If K groups are along a communication path, then if the average jitterprocessing delay at the bridge nodes is JQ_(avg), then the added delayintroduced in session if cut-through mode is used at bridge nodes is(K−1)/Q_(avg). If high latency mode is used, then added latency is2(K−1)/Q_(avg), at the added benefit of lower bandwidth.

Large Group Director.

Generally, in a large musical performance, a director/conductor leadsthe large group. In this large group implementation, one NM is marked ordesignated as the session director. As described below, a MN performermay provide hinting status that is shown at MNs in the session. Hintingstatus allows a performer to send non-auditory queues to MNs in thesession. Whereas only the intra-group members hint status is shown insession view at a MN, the director MN status is shown at all MNs in thesession. Although inter-group hint status could also be shown,intra-group hints are typically what are of interest to musicians withina large group.

Musician Hinting within Music Session

When musicians are physically in the same space, they pass manynon-verbal cues to each other. When immersed in a virtual environment ascreated by the interactive music system embodiments described herein,musicians will likely be unable to convey such cues effectively even ifvideo of themselves are streamed among them. As such, a hinting systemand related hinting device can be used to so that musicians canbroadcast status/cues to their peers in the music session.

FIG. 13A is a block diagram of an example embodiment 1300 for a musichinting system that allows non-verbal cues to be communicated among MNswithin a music session. For embodiment 1300, each MN includes a display1302, 1312, and 1322, respectively, that displays information for itsown music tracks and the peer music tracks within the music session. Avisual hint element is also displayed for each MN within the musicsession. Looking to display 1302, for example, information for the MN1track, the peer MN2 track, and the peer MN3 track are shown. Inaddition, a visual hint element is displayed for each of these tracks.Each visual hint element can be, for example, a circle or button imagethat visually changes (e.g., changes color, changes texture, changesbrightness, etc.) based upon hint cues selected by the user. The otherdisplays 1312 and 1322 can be similar to display 1302. Further, hintingdevices 1304, 1314, and 1324 are coupled to each of the MNs,respectively, to provide hinting control for a user. As shown withrespect to FIG. 13B, the hint devices 1304, 1314, and 1324 can be, forexample, a hinting device with pedals or buttons that are engaged orselected by a user, such as through the action of the user's foot. Thehinting devices 1304, 1314, and 1324 communicate user hinting selectionsto the MNs, and these hinting selections cause changes in the visualhint elements. Each MN also communicates its hinting selections to theother MNs in the music session, and these hinting selections are used ateach MN to adjust the visual hint elements associated with each MN,respectively.

FIG. 13B is a diagram of an example embodiment 1350 for afoot-controlled hinting device. This embodiment 1350 has two pressuresensitive pads as well as ten different selector buttons and controlbuttons (e.g., power, etc.). The hinting device electronicallycommunicates with the MN using one or more wired or wirelesscommunication connections (e.g., USB connections, Bluetooth connection,etc.).

The example embodiment 1350 for this hinting solution preferable has thefollowing properties and capabilities:

-   -   1. It is operated by a person's foot. This is ideal because        generally musicians have at least one foot not engaged for the        vast majority of instruments played.    -   2. It communicates and works with the MN display, showing status        sent by a musician on the display in low-latency.    -   3. The input/output from the device is processed through the MN        with low-latency (e.g., response time of less than 10 ms).    -   4. It is simple to use.

For the embodiment depicted, a footpad control with 2 pressure sensitivepads is used, although 4 pads or other numbers of pads could also beused. Each pad can also include a light by or around it that indicateswhether the pad is pressed and by its brightness representing how hardit is being pressed. The system has a foot rest pad, which has a rumblemotor in/under it. Other haptic feedback mechanisms may also be used. Anattention light is also present. The rumble motor or attention light isused to convey events specific to this user. The rumble/attentionnotifies the user that a peer has updated their status by pressing apad. A microcontroller circuit in the pad converts the pressures sensorinformation and sends it over USB (or similar) IO interface to the MNhost system communicating with the pad. The MN also sends down controlcommands to the pad, such as rumble on (and how hard)/off, attentionon/off, and/or other commands. The user, for example, may choose todisable rumble and only rely on the attention light.

When a user presses one or more of the pads, the pressure and the padnumber is sent through the IO interface to the MN. The MN broadcaststhis information to the peers in the session. The status display of theuser is updated in the display and if the recipient has a hint-systemattached, the attention/rumble command is sent to it.

The system throttles the frequency at which rumbles are sent to theusers foot to rate calibrated by the user, but activates the attentionindicator for each event. The musician then looks at the status of thepeer, and based on a previously agreed interpretation among them, thepeer acts accordingly.

Packaged Tunes Service (MAAS—Tunes Sessions)

Music as a Service (MAAS)—Overview.

When considering a distributed, real-time music service of this kind forinteractive music sessions, the needs of different classes of musicianscan be considered. Musicians who are members of a band can easily useand benefit from this kind of music service by simply joining andparticipating in freeform sessions because they already play regularlywith their band mates, and because they have a shared repertoire of theband's music that they all know how to play together. Likewise,independent professional and/or highly accomplished musicians canbenefit from this kind of music service because they have a strongnetwork of other musicians to connect with, and because they can eitherjam in freeform mode, or they have a deep set of common music on whichto draw while playing in sessions.

In contrast, amateur musicians, who far outnumber the more accomplishedand professional musicians above, are not well suited to participate ina freeform, unstructured music service of this nature. They do not havewell-established musical relationships with others, and they do notshare a common repertoire of music pieces, nor do they have theconfidence or the ability to just get online and start trying to playwith others in a freeform environment.

The “music as a service” (MAAS) embodiments described herein in partaddress the needs of the amateur musician by providing a packaged tunesservice with a number of features including Packaged Tunes, PackagedTune Sourcing, Packaged Tune Library, Local Play, Online Matchmaking,and Online Play, which are described further below. Professionalmusicians, accomplished musicians, and band members can also takeadvantage of these innovations.

FIG. 14 is a block diagram of an example embodiment 1400 for a packagedtunes service environment that allows users to access and downloadpackaged tunes for use with a MN or within a music session. The serverstores one or more packaged tunes with each packaged tune including oneor more tracks recorded from music sessions or obtained from othersources. The server operates as a tunes session server to allow MNs todownload a tune including its respective track recordings. For theembodiment depicted, MN1 has downloaded the tracks for TUNE1 and TUNE3;MN2 has downloaded the tracks for TUNE2 and TUNE3; and MN3 hasdownloaded the tracks for TUNE1 and TUNE2. The server can also providethese downloads only after a purchase transaction has occurred, suchthat an MN is required to purchase a tune prior to being allowed by theserver to download the tune and its track recordings. Further, the userinterface at each MN is used to display information related to thevarious features of the tunes sessions service described below.

In part, the tunes session service allows users to produce and share orsell songs. The tunes session service also allows a user that hasacquired a song to playback the song (e.g., tracks played back in sync,concurrently and mixed) while suppressing (e.g., muting) one or moretracks within the song. The playback may occur at a MN or any devicecapable of playing audio. The user(s) may also practice playing thetracks that are suppressed.

Packaged Tunes (Songs and Tracks).

Packaged tunes (e.g. recorded tracks associated with songs or musicalperformances with one or more recorded tracks being associated with eachsong or musical performance) represent a structured form of content fora given piece of music. The content and data associated with eachpackaged tune may include:

-   -   Recorded Tracks—These are the track-level recordings of each        instrumental and/or vocal component that together make up the        master mix of the complete musical performance.    -   Master Mix—This is the master mix recording of the complete        musical performance. It is optional and may or may not be        included in the content.    -   Music Notation—This is the music notation associated with each        individual track (i.e., the musical notes to be played and        lyrics for any parts to be sung). This may be displayed in sheet        music form, or via an animated presentation of notes that are        displayed on a musical staff in industry-standard form, with the        display of the notes timed to correspond to the moment at which        they should be played, or one or more other presentation styles.    -   Meta Data—This content includes data such as the name of the        piece of music, a description of the piece of music, the genre        of the piece of music, the date the original recording was        released, the artists and instruments played on the original        recording, and other pieces of data as well.    -   Unique ID (normalized)—Each packaged tune can be associated with        a unique identifier (ID) to normalize the music library for the        purpose of both commerce and royalty tracking, and for online        matchmaking. The unique ID can be used to identify each packaged        tune within the system.

Packaged Tune Sourcing.

Packaged tunes may be sourced in different ways, depending on thevarying desires of the parties involved. For example, the following areexamples for how the content can be sourced:

-   -   Original Performer. In one implementation, the packaged tune is        licensed from the copyright holder in its original mastered and        commercialized/distributed form. For example, a packaged tune        could be licensed for “Freebird” by the band Lynyrd Skynyrd. In        this instance, a custom license would be negotiated, and the        musician would have access to the track-level masters of each        instrumental and vocal performance that together make up this        piece of music. The music notation for this piece of music may        or may not be included in the content licensed from and        delivered by the copyright owner.    -   Cover Bands. In another implementation, if the music service        operator prefers, or if the copyright holder does not wish to        grant such a license, the music service operator may source        packaged tunes from cover bands using a crowd-sourcing content        model to aggregate a packaged tune music library. These cover        bands may use the distributed music service to generate        recordings for the packaged tunes, or may record in any manner        they choose, and the music service operator may then upload the        tracks that make up a packaged tune into the server systems for        the service, regardless of the recording source. Music notation        for the piece of music may or may not be included in the content        provided by the cover band. In this case, the music service        operator would pay a mechanical royalty to the copyright owner,        and may or may not also pay a royalty of some kind (up-front,        per unit sold, a combination of up-front and per-unit, or no        royalty and instead the provision of greater exposure on the        service) to the cover bands that generate the recorded tracks        for the packaged tune.

Packaged Tune Library.

As a user of the music service downloads each packaged tune (either withor without a purchase of a license to such packaged tune), that packagedtune is added to the personal packaged tune library of that user in themusic service. As such, the tunes service is aware of which packagedtunes each user has downloaded.

Local Play.

Once a packaged tune has been downloaded by a user, that user can entera local session alone, and can play along with the recorded tracks thatmake up the packaged tune. Unlike some other aspects of the interactivemusic service described herein, the user MN is playing alone within thelocal play and is not communicating with other user MNs across thenetwork. The local play can include one or more of the followingfeatures through the MN used by the user:

-   -   Automatic Substitution—Depending on which instrumental tracks a        user has configured and specified in the music service that        he/she will play, when the user enters a local session, the        music service will mute the appropriate recorded tracks        automatically. For example, if a packaged tune has recorded        tracks for electric guitar, bass guitar, and drums, and the user        has a track configured to play his electric guitar, then the        service will automatically mute the electric guitar recorded        track so that the user can play live in place of this recorded        track. The user may also choose to unmute the recorded track, or        half-mute the recorded track to have an audible guide for the        track that they are playing, optionally if desired.    -   Music Notation Display—The user may choose to have the music        notation displayed for any track they are performing optionally,        in any of the presentation styles noted earlier, or if they        prefer to play by memory, they may opt not to display any music        notation while playing.    -   Play Scoring—The music service may also optionally offer a play        scoring service that measures how well the user plays his track        or tracks, by monitoring which notes are played, when the attack        for each note takes place in time, and how long each note is        held. The play scoring service can then produce an aggregate        play score that indicates how well the user can play each track.        This play score can be used by the user to understand how they        are doing as they improve through practice, and can also be used        by the music service in the online matchmaking feature.

Online Matchmaking.

Once a user has confidence in his ability to play certain tracks in apackaged tune alone or otherwise chooses to do so, the user canparticipate in online tunes sessions to play packaged tunes with otherusers of the interactive music service, combining the interactive musicsession service and the packaged tunes service. Online matchmaking isused to facilitate online music performances with packaged tunes byallowing users to find tunes sessions within which to participate. Forexample, online matchmaking suggests tunes sessions that a user may jointhrough one or more of the following features:

-   -   Packaged Tune Sessions—When a user goes online, the user may        create a special kind of session, a session specific to a        particular, unique packaged tune. For example, a user could        create a tunes session for the performance of the packaged tune        “Freebird” by Lynyrd Skynyrd. In this case, the tunes session        would be a packaged tune session that carries the unique ID for        that specific packaged tune. Only users who have downloaded this        specific packaged tune into their packaged tune library would be        able to join this specific packaged tune session.    -   Packaged Tune Library—A user interested in joining a packaged        tune session can then scan or search available packaged tune        sessions. This search feature would automatically determine what        packaged tunes are in the user's packaged tune library and would        look for existing packaged tune sessions that are configured        with the unique IDs of packaged tunes that are in the user's        packaged tune library. A listing of the packaged tune sessions        that match the packaged tunes in the user's packaged tune        library can then be presented in an user interface as        prospective packaged tune sessions to join.    -   Packaged Tune Lobby—As an alternative to one user creating a        packaged tune session for one specific packaged tune, users        interested in playing in packaged tune sessions may join a lobby        area. The packaged tunes in each user's packaged tunes library        within the lobby area are analyzed to determine their packaged        tune IDs, and these packaged tune IDs are then compared to the        packaged tune IDs for the packaged tunes within packaged tunes        libraries for the other users in the lobby, as well as all the        existing packaged tune sessions that have been set up for a        specific packaged tune. The user can then scan a listing of all        existing and prospective sessions, and can either join an        existing packaged tune session, or can join one or more users        who have not yet created/instantiated a packaged tune session.        Joining other users will create/instantiate a packaged tune        session with these multiple users around a specified packaged        tune that all of these users have in their packaged tune        library.    -   Automated Track Analysis—In suggesting and displaying tunes        sessions, the online matchmaking also considers the instrumental        and/or vocal tracks that a user has selected to play within any        packaged tune session. For example, if a packaged tune session        has tracks for electric guitar, rhythm guitar, bass guitar, lead        vocal, backup vocal, and drums, and if an existing packaged tune        session already has live tracks from other users who are playing        drums and electric guitar, then a user interested in joining who        wants to play the bass guitar track will see this track within        the packaged tune session as a viable option for joining the        session. However, if the user instead wants to play the drums        track that is already being played, this packaged tune session        will not been seen by the user as a viable option for joining        the session. Similarly, if in the lobby area, two users who both        want to play the electric guitar track for a packaged tune that        both share in common in their packaged tune library would not be        matched as potential users for a common tunes session.    -   Network Scoring—The network scoring described above is can also        be used as a filter in the selection and ordering of packaged        tune sessions available for a given user, as it will favor the        presentation order of packaged tune sessions that are expected        to provide a higher level of user experience, such as packaged        tune sessions having low latency, low jitter, etc.    -   Play Scoring—Users may also see through the user interface the        play scores of other users for the packaged tunes in each        packaged tune session, enabling users to better select packaged        tune sessions to join. For example, sessions having other users        of comparable skill levels are likely good selections for a user        to join in order to avoid either frustration or embarrassment        for the user within the session. In addition to seeing the        displayed play scores, the user may also select to filter out        packaged tune sessions with users based upon specified play        scores. For example, only users having play scores above or        below a selected play score will be shown. Other play score        parameters may also be selected such as ranges of play scores        within which a user must fall in order to be shown.

Online Play.

When a user enters a packaged tune session with other users, theautomatic substitution and music notation display features describedabove with respect to the local play feature are also used and availablefor online play. Also, during or after a packaged tune session ends,each user in the packaged tune session is allowed to rate theperformance capabilities of the other users in the session. AS such,impartial third party ratings of a user's skill level can be generatedand stored with respect to the specific packaged tune that was part ofthe tunes session. These user ratings may then be used in the onlinematchmaking feature described above in addition to machine-based playscores that may be generated for a user.

Track Recordings and Skew.

As described with respect to high fidelity recording above, during asession, each MN produces one or more high fidelity tracks (R_(ai)) thatare uploaded to the server. As described above, these tracks are skewedin time relative to each other, based on the time delay in starting therecording at each location. To produce a final cut of each track, it ispreferable to correct or adjust the start time skew in the high fidelityaudio files. As also described above, an accurate reference clock,common to all MNs in the session is used to timestamp each recordingstart with that reference clock time. Similar to the example above, withthis reference clock timestamp, the algorithm below can be used toproduce final tracks that are synchronized:

-   -   1. Sort the high fidelity recordings (R_(ai)) by timestamp    -   2. The oldest timestamp represents the recording that started        latest (t_(OLD))    -   3. For each recording (R_(ai)), the delay (t_(Di)) relative to        the latest start time is represented as        t_(Di)=t_(OLD)−t_(STARTi) where t_(STARTi) is the record start        time for R_(ai).    -   4. The delay (t_(Di)) is the time offset in recording R_(ai)        that must be skipped to bring the recording in alignment with        that of the recording having the latest start.    -   5. The final track recording (TR_(ai)) for each recording is        produced by discarding t_(Di) worth of data from the recording        and then writing the result to the final track file. Automated        or manual calibration can also be used to tune this process.        Each final track represents one or more instrument or voice that        together as a set represents a song or performance. Assume N        tracks are in a song. Then the final song track (TR_(song)) can        be represented as a set of the individual tracks within the song        such that TR_(song)=(TR1 _(ai), TR2 _(ai), . . . TRN_(ai)}.

Tunes Service with Music Session.

Further, using the tunes service with respect to a music session, a setof tracks may be played back for instruments that that are not availablein the session while muting another set of tracks for instruments thatare available in the session. Two examples for modes of doing song trackplayback are now described for the set of tracks (TR_(song)) that areplayed back to users in a music session.

Single Source Track Playback.

The single source track playback mode is where one MN is the source ofthe song tracks being played back for all users in the session. This MNplays and streams the song tracks to other nodes in the session inlow-latency and mixed with other input tracks at the MN. In this mode,the song playback tracks will experience all the effects of jitter andpacket loss in the network being experienced by the MN.

Distributed High Fidelity Track Playback.

In this distributed high fidelity track playback mode, the content ofthe tracks of the song are securely distributed to a prescribed set ofMNs in the session. The set of MNs receiving the tracks can bedetermined by a number factors such as DRM (digital rights management)policies, MN capability, users' preference, other factors, and/or acombination of these factors. As with the live track recordings(R_(ai)), the interface for the session shows a common and sessionglobal track control for each song track at each MN location, enablingany user in the session to control the track volume, effects, mute, etc.for the whole session.

In this high fidelity mode, the song tracks at each MN are played backonly to as outputs for that MN. Because the tracks are played backlocally, the following benefits are provided: (1) no artifacts areintroduced due to processing through a jitter queue and/or due tonetwork artifacts, (2) high fidelity is provided because the tracks arenot compressed for streaming, and (3) no latency is introduced.

This high fidelity mode requires that playback of tracks be started andplayed synchronized if synchronization is desired, for example, in amusic session. The process described above for the distributed metronomecan also be used for this synchronization. When a user presses the“play” button, a “play start” command is sent to the MNs in the sessiondirecting them to start playing. The following describes an exampleembodiment for this process:

-   -   1. Each MN knows the network latency between it and every MN in        the session, as described above, and the maximum latency        (t_(MAX)) for its peer-to-peer connections can be determined        from these latencies.    -   2. Let the reference clock time for the MN at which the play        start is initiated be represented by t_(REF). The initiating MN        broadcasts a “play start” command to all peer MNs within the        session indicating that the start time for the “play” is to be        t_(START)=t_(REF)+2t_(MAX). Twice the maximum latency (2t_(MAX))        is used as a conservative approach, although a lower start time        bound of t_(START)=t_(REF)+t_(MAX) could also be used, as well        as other later start times.    -   3. A MN receiving the play start command waits until its        reference clock time (t) is about the designated start time        (e.g., t≅t_(START)). The accuracy of local clocks are typically        on the order of ±1 ms. If the designated start time (t_(START))        is earlier than the current reference clock time (t) for the MN        receiving the start commend (e.g., t_(START)<t), then the        command is late and the receiving MN re-broadcasts a new start        time with an increase to the 2× multiplier for its maximum        latency (t_(MAX)) to compensate for unexpected lateness of the        command.    -   4. Clocks at the MNs are assumed to be relatively matched in        drift. Thus, the starting time is important for them to remain        in synchronization.    -   5. Audio from the high fidelity tracks are played only to that        MN output. Thus the track playback is with no latency and is        synchronized across the session.

Match Making and Socialization Using Tunes Sessions.

As described herein, after practice playing tracks in songs, a user maydesire to play the track in a session with other musicians. Similarly, asession creator may desire to find users capable of playing particularlytracks of a songs in a session. The online matchmaking service allowsdiscovery and matching of capability and need for song and trackplayback in music sessions. The following are further examples of howthis service can be utilized:

-   -   1. Musicians list song tracks that they are capable of playing.        They also indicate their competency level.    -   2. Session organizer list songs that they plan to play in a        session and tracks they are seeking musicians to play. Session        organizer also indicates the time/date of the session.    -   3. A musician can search for sessions matching his/her        capability/interest within a geographic zone. He/she is also        allowed to subscribe to the session.    -   4. Session organizer can search for musicians matching the        session need. The session creator may invite, accept/reject        subscriptions. Once the need of the session is met, the creator        may close the session from accepting further subscriptions.    -   5. The system can rank the subscriptions to the listed session        by a variety of one or more factors, which can include:        -   Friendship—the subscriber is a friend of the session            creator.        -   History—the subscriber has played the track for the song in            previous sessions.        -   Competency—the user indicated competency compared with the            requested session competency.        -   Latency—The expected or actual latency between the session            creator designated MN and subscriber MN.        -   User scoring/ranking—based on the score of the subscriber on            this track as well overall score. Users are enabled to score            each other.        -   Other—one or more other selected factors.

Embodiments will now be further described with respect to APPENDIX A,APPENDIX B, and APPENDIX C below. APPENDIX A includes further details ofMN registration and control with respect to network-connected devices,with respect to a network connection service (Network as a Service—NAAS)to provide lower latency network communications for music sessions.APPENDIX B below provides further functional block diagram examples forthe interactive music system and related music nodes (MNs) and theserver system(s). APPENDIX C below provides example APIs (applicationprogram interfaces) that can be utilized.

Appendix A Network Data Streams and NAAS (Network as a Service)

The MN application works by sending and receiving audio stream data fromone or more other MN application instances located in the network. Audiodata is encoded and sent to multiple recipients and audio data isreceived from the same recipients, decoded, and mixed before beingplayed. Because latency is important, care is taken to minimize latencyperhaps at the expense of increased network bandwidth. One aspect ofthat is sending smaller chunks of audio data more frequently.

There are two sources of audio, one being music from an instrument ormicrophone, and the second perhaps being a chat sent from amicrophone/headset. The chat audio is optional.

In one embodiment, the music stream includes up to 256 kilobits/secondof captured and encoded audio data, chopped up into frames as small as2.5 milliseconds (400 frames/second). This frame size provides for about82 bytes per frame (assuming a byte is 8-bits). An optional chat streaman also be included with an additional maximum of 64 kilobits/second ofaudio data, or 21 bytes per frame. Headers or other wrappers are usedaround these two frames to distinguish their purposes (e.g., type, seq(sequence number), uid (user identifier)) for 9 bytes. So, as oneexample, 82+9 bytes are used for music, and 21+9 bytes are used forchat, leading to a total of 91 bytes for music and 30 bytes for chat oraltogether 121 bytes. An IP/UDP (internet protocol/user datagramprotocol) header wrapped around that is an additional 28 bytes, for atotal packet payload of 149 bytes per frame, 400 frames per second. Thetotal resulting bit rate is 477 kilobits/second (from a combined inputof 320 kilobits/second) for an increase in bandwidth of 49% due tooverhead. It is noted that this is one example packet structure that canbe used for network communications for the interactive music systemembodiments described herein, and other packet structures could also beutilized.

The overhead matters as it increases our transmission time and load onnetwork equipment. Many home users have asymmetric network connectionswhich have a smaller upload capability than download. Often a home useris limited to only 1-3 megabits/second for upload. Correspondingdownload capabilities range from 5-30 megabits/second. If a jam musicsession is being carried on with 5 users, four of them remotely located,that means our total data upload requirement is 497*4=1,908kilobits/second. This is very close to the limit of many a home user'supload capability, and out of reach for a significant fraction.

Also, for this five piece band and using the maximum frame rate, sending1,600 frames per second are being sent up to the internet from eachmember. Experiments have shown that this frame rate can swamp most homenetworking equipment. When frames come too fast, frame processing getsbogged down. This can cause delays in passing the frames through to theinternet from the local network. Temporary bursts can often be absorbedby buffering the excess frames and sending them as-soon-as-possible, butwhen frame rates are persistently higher than can be handled bybuffering, another solution is employed: drop the excess. Exampleembodiments are described above for buffering using a jitter queue anddropping packets at the end of time windows.

While frames are being sent, the same 1,600 frames per second are beingreceived, and likewise at 1,908 kilobits/second. This load will furtherdegrade the performance of the home networking equipment. Often theresult of this degradation is that frames are delayed or droppedoutright. This can cause the audio streams to lose synchronization orsound fuzzy or even choppy. Late frames are the same as dropped frames,further degrading audio quality.

Finally, once frames are on the internet they can take complicated andvariable paths to their destinations. Two users both on Time Warner'snetwork in Austin will have a different path (and perhaps shorter)between them than two users where one is on one ISP (e.g., Time Warner)and the other is on another ISP (e.g., AT&T). And if the users are indifferent cities then that adds additional path variability. Finally,equipment congestion, failures, and maintenance might introduce evenmore path variability. Different paths have different capabilities andloads as well. Path variability matters because each path induces delay.For a given path, the delay may vary minute to minute, even second tosecond.

Thus, items to be concerned with for the network communications for theparticipants within the interactive music system include: (1) bandwidth,(2) delay, and (3) reliability.

So, NAAS (network as a service) embodiments described herein are used toimprove upon the server services described above by reducing latency forcommunications within the interactive music system. While some latencystill exists for audio encoding and decoding, the upload and downloadbandwidth requirements can be better managed using the NAAS embodiments,and the network path variability can be better managed for a large classof users.

Bandwidth

As indicated above, bandwidth is increased by 49% due to encoding of theaudio, breaking it up into frames, and then wrapping it to form networkcommunication packets. Bandwidth is also multiplied it by a factor thatcorresponds to the number of other participants in the session. Let'slook at each step:

-   -   1. Encode—Audio encoding likely can not be significantly        adjusted. Any attempt to compress audio more than it is already        compressed will likely add delay (e.g., once the audio is        presented to the networking layer).    -   2. Wrap (e.g., type, seq, uid)—Wrapping is useful to separate        audio streams from different sources and manage missing and out        of sequence frames.    -   3. Wrap with UDP—A protocol, such as UDP, is used to transmit        the data across the internet. It is possible, however, to carry        more data in a single UDP frame to eliminate 28 bytes per frame        of excess wrapper. This variation is described in more detail        below.    -   4. Upload to each participant—This has a large effect as it is        not just a percentages bigger, it is integral factors bigger.        When there are more than two participants in a session, the same        exact data is being sent more than once to the different        participants. If this data can be sent once and have it be        resent or multicast to the other participants, bandwidth needs        and latency could be greatly reduced.

Upload performs these steps in the order specified. The obvious thing topick on, the biggest, is step 4. So if step 4 can be optimized byutilizing some sort of multicast capability, as many MNs as desired canbe supported within a music session and only require 400 frames persecond upload at a rate of 477 kilobits/second. This is well within thecapability of most home internet users. This is a dramatic savings inboth upload bandwidth and frame count. Also, more home routers canhandle this lower frame rate, and so the number of potential usersincreases.

This is called upload scattering.

Download performs these steps (more or less) in the reverse order.Multiple participants across the internet uploads and sends audio datato, and the local MN subsequently downloads this data, unwraps it, anddecodes the audio streams. The MN then combines the various audiostreams into a single audio stream which is played out at the MN, suchas through a speaker. As indicated above, the user has the option ofcontrolling the volume of each individual participant's contributions towhat is being heard.

The obvious best case would be to download a single audio stream andplay it out of a speaker. This would require significant processing inthe internet at server systems to completely unwrap and decode the audiostreams from each participant, combine them into a single stream, takinginto account volume settings for each stream, then encode and rewrap itbefore downloading to a participant. As with upload, this would support(assuming infinite computational ability in the internet) as manyparticipants as would be liked in a session and only require 400 framesper second download at a rate of less than the 477 kilobits/secondupload requirement.

The computational ability in the internet server systems is called intoquestion, of course, as it adds additional delay and expense, plusdifficulty accounting for each participant's volume settings andmechanisms for manipulating those, etc. Also it requires code in theinternet server systems to decode and encode audio, mix it, wrap andunwrap, etc. This is not an easy capability to deploy and maintain,debug, etc.

For one embodiment, during each 2.5 millisecond slice of active sessiontime, one frame from each participant will be received on average. Theseframes are combined together in the internet NAAS server systems, andthese combined frames are downloaded from the server systems by the MNsas a single UDP packet. This combining of frames reduces download framecount from the server systems, and also reduces bandwidth requirements.

The audio data from frames (e.g., audio data from audio data frames oraudio plus video data frames) in packets received from multiple MNs canalso be combined together by the NAAS server systems, and this combinedaudio data can be downloaded from the NAAS server systems to the MNs asa single UDP packet. This combining of audio data from communicatedframes reduces the packet rate that is used to for processing by the MNrouter and also reduces bandwidth requirements on the receiving MNInternet service provider (ISP).

To quantify these savings, assume four remote participants generating121 bytes of UDP payload per frame (see above). That's a total of 484bytes of payload if these frames are mashed together. Adding a UDPwrapper, this becomes 512 bytes total size, or 1,638 kilobits/second.This is not a big improvement over 1,908 kilobits/second for normalnon-optimized download (14%). But, only download 400 frames/second aredownloaded instead of 1,600, which is of course a quite dramaticimprovement. Home routers will be happier.

So, rather than sending payloads immediately to the intended recipient,the server waits to see if it can gather up a few more to grouptogether. However long it waits, it is delaying the earliest packet bythat much.

This is called download aggregation.

Delay

Another factor affecting our audio quality is delay. The total delay ofa frame is the total of all the delays along the path from oneparticipant (A) to another (B). This includes the following at least:

-   -   Encoding delay (2.5 ms)    -   Processing to wrap and transmit (small delay)    -   Transmit to home network equipment (4 ms)    -   Transmit from A to A's ISP (variable delay)    -   Wander from A's ISP to B's ISP (variable delay)    -   Transmit from B's ISP to B (variable delay)    -   Transmit from home network equipment (4 ms)    -   Processing to receive and unwrap (small delay)    -   Decoding delay (jitter buffer delay)

The big delays here have to do with the ISP delays and internet delays.If A and B are both in the same locale and use the same ISP, this is asgood as it can get (except if they are in the same house).

FIG. 15A is a block diagram of an embodiment 1500 including two musicnodes (A, B) communicating with each other through an ISP.

Likely the data moves from A to B on equipment located on private highspeed networks operated by the ISP. Still the delay could be 5-10 ms iflocated in the same locale.

When A and B are one the same ISP in different locales, then the funbegins. Topology and style varies greatly among different ISP, but it islikely that some of the data will traverse some public networks. SomeISP might tie each locale to the internet directly, while another maytie all their private networks together and then tie them to theinternet at a few key points.

When A and B are on different ISP it looks a lot like the above case,but perhaps even more complicated. Suppose A is on Time Warner in Austinand trying to route data to B on Comcast in Austin. What if A's datafirst hits the Internet in Dallas and then has to get to Minneapolis toget into Comcast? Data moving across town goes from Austin to Dallas toMinneapolis and then back to Austin. And who's to say that data movingacross the internet from Dallas to Minneapolis is a single hop?

FIG. 15B is a block diagram of such an embodiment 1510 including twomusic nodes (A, B) communicating with each other through different ISPs.For the embodiment depicted, A is located in Austin and uses Time Warneras its ISP, which has its direct internet backbone connection systems inDallas. B is located in Austin and uses Comcast as its ISP, which hasits direct internet backbone connection systems in Minneapolis.

To address these delays, NAAS server systems can be located at strategicpoints on both Time Warner's and Comcast's networks in Dallas. Datatrying to move between the two in Austin might merely need to utilizethe NAAS server in Dallas to jump directly from Time Warner's network toComcast's network. Customers in Dallas would benefit the most, perhaps,but users within a few hundred miles of Dallas might certainly be betteroff than otherwise.

FIG. 16 is a block diagram of an embodiment 1600 including NAAS serversystems 1602 connecting two independent ISPs. For the embodimentdepicted, A is located in Austin and uses Time Warner as its ISP, B islocated in Austin and uses Comcast as its ISP. However, unlike FIG. 15B,the NAAS server systems 1602 provide network connection services betweenthe two different ISPs and thereby reduces latency of communicationbetween the music nodes (A, B).

This is called path optimization.

A more advanced system might allow user A to hit one of our servers nearhis locale, the data flows across a backbone network to another of ourservers near B's locale, and is then delivered to B.

This can be called advanced path optimization.

Setting Up a Session without NAAS

Just to put it all in context, let's look at how a non-NAAS session issetup. The first participant creates a session and then invites theother two to join. In the end, they are each sending audio streams tothe other two:

FIG. 17 is a block diagram of an embodiment 1700 including three musicnodes (A, B, C) communicating with each other and the server systems toset up a non-NAAS music session.

A is the name of a participant, as are B and C. The solid line betweeneach pair of participants indicates the bi-directional flow of data. Toaccomplish this setup, here are the necessary steps:

-   -   1. A starts the session    -   2. B joins the session    -   3. B is told about A    -   4. A is told about B    -   5. C joins the session    -   6. Cis told about A    -   7. Cis told about B    -   8. A is told about C    -   9. B is told about C        As each participant is “told” about another, the told        participant begins to send data to the participant it was told        about.

In a like manner, the session is torn down in a similar set of steps:

-   -   1. C leaves the session    -   2. A is told that C left    -   3. B is told that C left    -   4. B leaves the session    -   5. A is told that B left    -   6. A stops the session        There are fewer steps because when C leaves, C doesn't need to        be told anything about A or B, etc. It is noted that example        message sequences for starting and stopping a non-NAAS session        are described below.

FIG. 20A is a swim lane diagram of an example embodiment 2010 for amusic session start by music node A where music nodes B and C then jointhe session. The swim lane diagram includes the interactive music systemserver and music nodes A, B, and C.

FIG. 20B is a swim lane diagram of an example embodiment 2020 for amusic session stop where music nodes B and C leave the session. The swimlane diagram includes the interactive music system server and musicnodes A, B, and C.

How NAAS Works

To be effective, NAAS server systems are preferably directly connectedto as many ISP networks as are important in a given locale. This meansone interface for each ISP network (e.g., ISPs for MNs 1-4 in FIG. 18Adiscussed below) and thus one address per ISP network as well. In orderto determine which address of a NAAS server a participant should use, itis useful to know the ISP network for the participant and match that tothe ISP's network address on a NAAS server. If the participant's ISP isnot represented (e.g., ISP for 5 in FIG. 18A below), then one way todetermine which address is best is to test them all. Given thedifficulty of “knowing” and “matching,” it seems better to just have theparticipant test each address of a representative sample of nearby NAASserver systems to determine the proper address to use. It is furthernoted that the network interfaces for the NAAS server systems includeboth physical interface implementations or virtual interfaceimplementations or combinations thereof.

FIG. 18A is a block diagram of an embodiment 1800 including NAAS serversystems 1602 providing communications among four of music nodes for amusic session. The NAAS server systems 1602 have direct connections tothe ISPs for music nodes 1, 2, 3 and 4, but does not have a directconnection to the ISP for music node 5.

The participant will send data to the best address of the NAAS, and theNAAS will forward the data to the other participants in the sessionusing the address for each of them. Data coming from the NAAS to aparticipant will be “from” the best address at the NAAS for thatparticipant.

Let's suppose there are three participants, A, B, and C in a session. Aand B are on ISP network 1, while C is on ISP network 2. A and B willuse the NAAS address for ISP network 1, while C will use that for ISPnetwork 2:

FIG. 18B is a block diagram of such an embodiment 1820 including threemusic nodes (A, B, C) communicating with each other through twodifferent ISPs. Because A and B are on the same ISP, the NAAS serversystems 1602 use one direct connection (N1) for communications to/from Aand B. For C which is on a different ISP, the NAAS server systems 1602use another direct connection (N2) for communications to/from C.

When A sends data to N1, NAAS sends it to B and C. Data sent by B to N1will go to A and C, and data sent by C to N2 will go to A and B. Datasent to A from NAAS will be from N1, likewise N1 for B, and N2 for C.This is the situation when all three of A, B, and C are authorized touse NAAS. Here it is in tabular form:

If Received Using Then Send Using From Interface To Interface A N1 B N1A N1 C N2 B N1 A N1 B N1 C N2 C N2 A N1 C N2 B N1

The first row is read as “if data is received from A using interface N1,then NAAS should send it to B using interface N1.” The information inrow 3 is a mirror image of the information in row 1. This fact can beused to compress the tables (not shown above).

Note also that the received data is matched against only the first twocolumns of each row. Where multiple rows are matched, all are triggered.In the table above, “received from A/N1” matches two rows, one “thensend to B/N1” and one “then send to C/N2.”

As the play session is started and participants join it, the NAAS serversystem is updated with these rules. As participants leave, the rulescorresponding to the participant are removed. Any data arriving from asource not in the table is ignored.

Note that A only sends one copy of the data to NAAS. NAAS forwards twocopies, one to B and one to C.

The NAAS server can be implemented with or without download aggregation,if desired. For example, download aggregation cannot be provided, andupload scattering and path optimization can be provided by the NAASserver systems. As such, when not all the participants in a session areenabled to use NAAS, then those participants do not get to use thefeatures of NAAS directly. They will continue to send packetsindividually to each other participant. But instead of sending to NAASparticipants directly, they will send to the appropriate NAAS addressfor such participants instead.

For traffic that goes through the NAAS server system, single stream uppacket communications and multicast out packet communications to otherMNs in the music session can be used. This multicasting saves bandwidthand packet rate on the sending MN, and can also enable delivery ofbandwidth hungry payload like video, which could otherwise require toomuch bandwidth to send to other MNs in the music session, for example,due to typically asymmetric bandwidth (e.g., constrained uplinks)

It is further noted that to connect MNs over greater distances vialatency optimized links, MNs may connect to different NAAS serversystems, and the different NAAS server systems can be connected with ahigh-speed backbone, or direct communication links can be providedbetween such NAAS server servers. It is also noted that if all MNs in asession are connected (e.g., proxied) through a NAAS server system, theMNs can have the NAAS server capture and process audio or video plusaudio recordings, download them after the session to the MNs, and/orupload them automatically to another network destination (e.g., YouTube,etc.). It is further noted that if MNs in a session are connected (e.g.,proxied) through a NAAS server system, the MNs can have the NAAS servermix the audio data from the MNs at the NAAS server system and send backthe fully processed and mixed audio data (e.g., audio mix) to each MN inthe music session. This avoids each MN from processing and mixing thestreams of all MNs to form mixed audio. In addition, it is noted thatthe NAAS server system can be configured to store a recording of theaudio mix within one or more data stored systems, and the NAAS serversystem can then broadcast the audio mix recording to one or more networkdestinations. It is still further noted that the NAAS server systems arepreferably placed at IXPs (Internet Exchange Points) and directlyconnected to these IXPs. An IXP is the network infrastructure device ordevices where the ISPs physically cross connect with each other andcommunicate peer traffic across their networks. As such, if a NAASserver system is physically co-located at an IXP, this NAAS serversystem will effectively be cross connected to the major ISPs thatservice a region through this IXP, and NAAS proxied latency will beminimized for MNs communicating through the NAAS server system.

FIG. 19 is a block diagram of an embodiment 1900 including three musicnodes (A, B, C) where only A is a NAAS participant.

Supposing that B and C are not NAAS participants, and only A is a NAASparticipant. The above table is modified as follows:

If Received Using Then Send Using From Interface To Interface A N1 B N1A N1 C N2 B N1 A N1 C N2 A N1

The rules relating to B sending to C and C sending to B are absent. Band C must continue to send directly to each other:

In this way A sees a reduction in his upload bandwidth utilization,while B and C don't. A's data sent to B and C also enjoys pathoptimization, as does B and C's data sent to A. But B and C's data sentto each other is not path optimized, and neither B nor C sees anyreduction in upload bandwidth utilization.

Note that if B is a NAAS user as well as A, then C will reap fullbenefits of being a NAAS member without having to pay. In general thisis true whenever N-1 participants are NAAS users.

As described in the session setups below, automated discovery of lowestlatency path from an end user MN to one interface on a NAAS serversystem can be determined, for example, by ping testing against all theinterfaces/ISPs across some subset of the NAAS server systems indifferent regions. This automated discovery can also be repeated overtime that the interface used by the MN is dynamically adjusted over timebased upon the latency determination. Further, NAAS server systemspinged as part of this latency testing can be limited by parameters suchas geographic location and related distances in order to avoid NAASservers where geographic distances makes them an unlikely low latencycandidate. Different NAAS server systems can also communicate with eachother as part of this latency testing.

There is a possibility that, since A and B are on the same ISP network,that A and B would be better off sending directly to each other. A isnow faced with a tradeoff: enjoy the benefit of upload scattering, oruse the better path to B. In order to make that choice, A would need totest whether sending to B via N1 was better than sending directly to B.If the choice was made to use the direct path, NAAS would have to betold to remove any entries from the configuration table involving A toand from B. A would also want to test B's address first to see if it wasindeed the best path to use.

Thus, each MN in a music session can make an automated determination oflatency for peer-to-peer communications and latency for NAAS servercommunications (e.g., proxied latency) to see which latency is betterwith respect to communications to each other MN in the music session.The lowest latency communications can then be used for the musicsession. It is noted that the NAAS server latency can be determined fortwo MNs (e.g., MN1, MN2), for example, by adding MN1-to-NAAS latencyplus NAAS-to-MN2 latency (e.g., equals NAAS proxied latency MN1 to MN2).This NAAS server latency can then be compared with latency for simplepeer-to-peer (MN1-to-MN2) latency. The lower latency path can then beselected and used for communications for the music session.

It is further noted that if possible, this session traffic can be routedbased on lowest latency connection determinations (e.g., peer-to-peerpath or NAAS proxied path), and this can then be adjusted if packet rateor bandwidth constraints cause the lower latency path to beunsatisfactory for session communications. For example, if packet rateand/or bandwidth constraints present communication problems, anintelligent tradeoff can be made between the different connection paths(e.g., between the peer-to-peer path and the NAAS proxied path) so thatcommunications stay within bandwidth and/or packet rate constraintswhile reducing average or median latency across the connections in thesession. Further, MNs may continuously check the latency to NAAS/peersand may elect, or be directed by the NAAS server, to dynamically migrateconnections to another NAAS or from NAAS mode to peer-to-peer mode (orvice-versa) if network conditions or NAAS load parameters or otherparameters indicate these adjustments are to be made. For example, aping test can be followed by a decision to migrate that causes an MN toleave and re-join a music session with the new parameters in effect.Other variations could also be implemented while still taking advantageof this session migration, and a variety of session migrate protocolscan be used to make a determination of when an MN migrates and/or isinstructed to migrate by the server.

Session Setup with NAAS

Setting up a session with NAAS (everyone enabled) looks like this:

-   -   1. A starts the session    -   2. A told to test NAAS addresses (N1, N2, N3, N4)    -   3. A determines that N1 has the lowest latency    -   4. B joins the session    -   5. B told to test NAAS addresses (N1, N2, N3, N4)    -   6. B determines that N1 has the lowest latency    -   7. NAAS is told to add a rule (A, N1, B, N1)*    -   8. B is told about A (N1)**    -   9. A is told about B (N1)    -   10. C joins the session    -   11. C told to test NAAS addresses (N1, N2, N3, N4)    -   12. C determines that N2 has the lowest latency    -   13. NAAS is told to add a rule (A, N1, C, N2)    -   14. NAAS is told to add a rule (B, N1, C, N2)    -   15. C is told about A (N2)    -   16. C is told about B (N2)    -   17. A is told about C (N1)***    -   18. B is told about C (N1)    -   * The notation “add a rule (A, X, B, Y)” means “add a rule that        when data shows up from A using X it is sent to B using Y and        vice versa.”    -   ** The notation “told about A (X)” means “told that A has joined        the session and audio data should be sent to address X.”    -   *** When A is told about B (N) and later C (N), A only needs to        send to N once. NAAS will then send the data to both B and C.        The jam software should only send to whatever unique collection        of addresses it has. (NAAS users will only have the one address        they picked, but for non-NAAS users not all the addresses will        be unique.)

FIGS. 21A-B provides a swim lane diagram of an example embodiment for amusic session start by music node A where music nodes B and C then jointhe session and where all three nodes (A, B, C) are NAAS participants.The swim lane diagram includes the NAAS server, the interactive musicsystem server, and music nodes A, B, and C. Also, it is noted thatembodiment 2110A in FIG. 21A is connects at the bottom to the top ofembodiment 2210B in FIG. 21B.

FIG. 21C is a swim lane diagram of an example embodiment 2120 for amusic session stop where music nodes B and C leave the session and whereall three nodes (A, B, C) are NAAS participants. The swim lane diagramincludes the NAAS server, the interactive music system server, and musicnodes A, B, and C.

Session Setup with Mixed NAAS and Non-NAAS

Setting up a session with A enabled for NAAS while B and C are not(changes are bracketed and italicized):

-   -   1. A starts the session    -   2. A told to test NAAS addresses (N1, N2, N3, N4)    -   3. A determines that N1 has the lowest latency    -   4. B joins the session    -   5. B told to test NAAS addresses (N1, N2, N3, N4)    -   6. B determines that N1 has the lowest latency    -   7. NAAS is told to add a rule (A, N1, B, N1)    -   8. B is told about A (N1)    -   9. A is told about B (N1)    -   10. C joins the session    -   11. C told to test NAAS addresses (N1, N2, N3, N4)    -   12. C determines that N2 has the lowest latency    -   13. NAAS is told to add a rule (A, N1, C, N2)    -   14. [NAAS is told to add a rule (B, NJ, C, N2)]        -   * because B and C are not members    -   15. C is told about A (N2)    -   16. C is told about B [(N2)]    -   17. A is told about C (N1)    -   18. B is told about C [(N1)]

Note that NAAS was not told about B to/from C, and B was told to send toC instead of C (N1), and vice versa for C sending to B instead of B(N2).

FIGS. 22A-B provide a swim lane diagram of an example embodiment for amusic session start by music node A where music nodes B and C then jointhe session and where only music node C is a NAAS participants. The swimlane diagram includes the NAAS server, the interactive music systemserver, and music nodes A, B, and C. Also, it is noted that embodiment2110A in FIG. 21A is connects at the bottom to the top of embodiment2210B in FIG. 21B.

FIG. 22C is a swim lane diagram of an example embodiment 2120 for amusic session stop where music nodes B and C leave the session and whereonly music node C is a NAAS participants. The swim lane diagram includesthe NAAS server, the interactive music system server, and music nodes A,B, and C. L

Message Sequence Diagrams

Example control messages and sequences for setup and tear down areprovided with respect to FIGS. 20A-B, 21A-C, and 22A-C as indicatedabove. It is noted that for these swim lane diagrams testing is shownonce, and then left it out of the main diagrams for simplicity. Startand stop are similar and are also shown once then omitted forsimplicity. Further, it is noted that these swim lane diagrams provideexample embodiments, and variations could be implemented.

Looking to the message sequence diagrams, FIG. 20A shows the sessionmanagement messages that flow between music nodes when no NAAS isinvolved. In this flow, there are three music nodes A, B and C. Each MNhave a unique session id respectfully Aid, Bid, Cid. When a MN sends amessage, the message includes its IP (Internet Protocol) address/name,session id and the id of the peer that it wants the message to bedelivered. The server uses this information to validate the source anddestination before relaying the message to the destination music node.In FIG. 20A, A sends a “start session (Aid, A)” message to server. Theserver uses the information in the message to instantiate a sessionobject with id S, with the properties that A requested. The serverreturns S to A. Properties of the session can include the genre ofmusic, the skill level of musicians that may join the session, whetherthe session is public or private, etc. A session object in the server issearchable by users looking for music sessions to join.

After the creation of session S, by A, user at music node B discoversthe session by one of several methods. The server may sends anotification message (e.g., email or instant message) to user at B,inviting the user to join the session. The user at B may also search theserver and discover the existence of session S. After the user at Bdiscovers the existence of session S, the server provides a join sessionlink for S that user at B clicks to request to join the session. Thus, auser at music node B sends a join session message from B to the serveras “join session (S, Bid, B)”. The server validates the existence of Sand that user at music node B has the rights to join it, and if true,adds music node B to the session and returns OK. If B is not allowedjoin the session, no further communication occurs to B with respect tothe session.

At this point, the server notifies music node A that music node B hasjoined the session with the message to A, “join session (S, Bid, B)”.Concurrently a message is sent to music node B with the message “joinsession (S, Aid, A)”. When these messages are received at A and Brespectfully, they now have each other's session id and music nodename/IP address. This information is used by music node B to send amessage via the sever to music node A as “start audio (A, B)”. Similarlymusic node A sends a message to B with request “start audio (B, A)”.Both A and B use the server to negotiating the message flow needed toallow them to send audio to each other.

Similarly to the user at music node B, a user at music node C discoverssession S and requests to join with a message to the server, “joinsession (S, Cid, C)”. If C is allowed to join S, then the servernotifies A and B that C has joined the session with message “joinsession (S, Cid, C)”. Concurrently, C is notified to join sessions withB and A with “join session (S, Aid, A)” and “join session (S, Bid, B)”.The successful execution of the join session messages is followed bymessages “start audio (A, C)”, “start audio (B, C)” initiated by C to Aand B respectively. Similarly A sends message “start audio (C, A)” to C,and B sends “start audio (B, C)” to C.

Music nodes A, B and C are now in session S.

FIG. 20B shows the graceful process of leaving a session when no NAAS isinvolved. A graceful departure from a session implies that the user atthe music node (MN) requested to leave. An ungraceful departure happenswhen the music node (MN) is no longer able to communicate with the musicnode (MN) peers or with the server. In this case, the heart-beatmessages that flow from the music node to the server stops and theserver proceeds to remove the music node from the session by sendingmessages to nodes that are still in the session that carry the samemessage as if the unresponsive node had requested to leave the session.

The user at music node C requests to leave the session S. Music node Csends a message to the server “leave session (S, Cid, C)”. The serverthen sends messages to A and B respectfully, “left session (S, Cid, C)”.Concurrently, C sends messages to A and B to stop audio messages. Csends “stop audio (C, A)” to A and to B it sends “stop audio (C, B)”.The server removes C from session and nodes A and B removes C as a peerthat they will communicate with in the session.

Similarly, when music node B leaves the session, it sends to server“leave session (S, Bid, B)”. The server then sends message “left session(S, Bid, B)” to A. Music node B also concurrently sends “stop audio (B,A)” to music node A. Music node A removes B from the set of peers itwill communicate with. The server removes B from the music nodes in thesession S.

Finally, music node A leaves the session and being the creator of thesession, it may choose to terminate the session with a message “stopsession (S, Aid, A)”. Otherwise it sends message “leave session (S, Aid,A)” to the server. Typically, the stop session is implicit, when thelast node in the session leaves the session. When the server receivesthis message, it deletes the session object and by definition, thesession ceases to exist.

FIG. 21A shows the message flow for a music session setup where a NAASserver is involved. Here the NAAS server has four ISP (Internet ServiceProvider) terminations T1, T2, T3 and T4 respectively. The NAAS serveris hosted at an Internet exchange point, where it can have directconnection into networks of various ISP vendors, represented byconnections T1, T2, T3 and T4. The number of ISP terminations can bemore or less. Logically, the NAAS may be viewed as being a super musicnode, that is has access rights to all music sessions. The service usesbusiness logic to filter user music nodes that may participate in asession with the NAAS.

In this flow, music node A starts a session by sending a “start session(Aid, A)” message to the server. If music node A is not allowed to usethe NAAS, the logic described before in FIG. 20A is followed. If A isallowed to use the NAAS, then the server sends a message to the NAASinforming it that A is joining the session. This message is called asetup (A). The semantics of a setup message is that A should invokealgorithm that test which ISP termination (T1-T4) on the NAAS gives thelowest latency of communication between the NAAS and music node A.

If the NAAS is able to accommodate more clients, it replies to the setupmessage to the server with “ok (T1, . . . , T4)”. The NAAS registersmusic node name A as a node that it is authorized to communicate. Theserver forwards a message to music node A to test which interface on theNAAS it has the lowest latency communication, “test (T1,T2,T3,T4)”.Music node A invokes a network latency-testing algorithm, and the NAASgenerates start session update message to server with latencyinformation from the NAAS, “start session (Aid, A, (ST1, ST2, ST3,ST4))”. The server instantiates the session S and replies OK to A. Theserver relays this information to the NAAS as “assign address (S, A,(ST1, ST2, ST3, ST4))” which caches this information by associating theinterface with the lowest music node A and session S. If two or moreinterfaces have the same delay, an algorithm is used to select one(e.g., load balancing, lower mac address, etc.). It also binds theinterface address with the lowest latency to A, as the preferred addressthat it will use to send messages to music node A. This interface isreferred to as NA. The NAAS replies OK to successfully caching andbinding from a “assign address” message.

Later, the user at music node B discovers session S and initiates arequest to the server with “join session (S, Bid, B)”. Similar to A, theprocess described for a “test (T1,T2,T3,T4)” is invoked with music nodeB to find the lowest latency to the NAAS. Music node B ultimatelyreplies to the server with “join session (S, Bid, B, (ST1, ST2, ST3,ST4))” which results in message “assign address (S, B, (ST1, ST2, ST3,ST4))” sent to NAAS. The NAAS determines which ISP/network interface isthe lowest latency path for communicating with B and binds thatinterface with B and session S. This interface is referred to as NB. Italso uses the session id S, to recognize that music nodes A and B needto communicate and add a forwarding rule “add rule (S, A, NA, B, NB)”.This rule authorizes messages to flow between node A and B in session Svia interface NA and NB. The NAAS replies OK to the “assign address”message and the server then relays OK to B's “join session” request. Thereply to B carries the NAAS network interface for A that B should use tocommunicate with music node A.

Concurrently, the server sends message “join session (S, Bid, NA)” tomusic node A and “join session (S, Aid, NB)” to music node B. Musicnodes A and B do not send messages directly to the network address ofeach other. Rather, they send messages to each other via the NAAS, whichserves as a packet relay. As such, at this point the NAAS instructs bothA and B to start sending audio with command “start audio (NA, A)” and“start audio (NB, B)”. Music node A sends audio messages to B by sendingto the NAAS interface IP address NA. The NAAS receives the message fromA, determines the message destination is music node B, and relays themessage to B by sending it out interface NB to music node B IP address.Similarly, messages from B to A are sent to the NAAS address NB. TheNAAS determines the destination of the message is music node A and sendsthe packet out network interface NA to music node A. Thus, audio flowsbetween A and B relayed via the lowest latency path they have to theNAAS.

FIG. 21B illustrates the message flow that occur when music node Crequests to join a session that includes music nodes A and B which arealready in a session with a NAAS as shown in FIG. 21A. As before, theserver instructs C to perform a latency test against the NAAS with test(T1,T2,T3,T4)”. Music node C then reports the result to the server whichthen sends “assign address (S, C, (ST1, ST2, ST3, ST4))” to the NAASserver. The NAAS binds the corresponding lowest latency interface NC tonode C. The NAAS uses the session id S, to determine that C is joiningthe session involving music nodes A and B, and adds forwarding rules“add rule (S, A, NA, C, NC)” and “add rule (S, B, BA, C, NC)”. Thisauthorizes the flow of packets between music nodes A, B and C.

The server then notifies A and B that C has joined the session with“join session (S, Cid, NA)” and “join session (S, Cid, NB)” sent to Aand B respectively. Similarly, messages “join session (S, Aid, NC)” and“join session (S, Bid, NC)” are sent to music node C. Thus C sendmessages to NAAS address NC to communicate with A and B.

With these rules in place, “continue audio” messages are sent to nodes Aand B to “start audio” messages to node C. It is noted that because theNAAS handles packet relay to music node C, music nodes A and B do notneed to do anything further to send audio to music node C. Any audiopacket by any music node in session S will be broadcasted by the NAAS tothe member music nodes using the bounded interface for communicatingwith the destination music node. Music node C is also told to startsending audio to A and B by sending to NAAS address NC. The servercommand to music node C is “start audio (NC, C)”.

A hybrid mode of operation is where the server may direct music nodes touse peer-to-peer latency test. If the latency between peers is lowerthan the path via a NAAS server, the server may direct the peers to usethe non-NAAS mode of communication, described in FIGS. 20A and 20B.

FIG. 21C shows the message flow when music node C leaves a sessioninvolving a NAAS.

Music node C sends message “leave session (S, Cid, C)”. The message isrelayed to the NAAS, which translates this as an action to drop therules that allow communication with music node C in session S. Thus, theNAAS executes commands “drop rule (S, A, NA, C, NC)” and “drop rule (S,B, NB, C, NC)” and finally releases the binding of node C to interfaceNC with command “release address (S, C, NC)”.

After each drop rule command, messages are sent to the correspondingmusic node to “stop audio (C, NC)”. Finally, the server notifies themusic nodes that C has left the session with “left session (S, Cid, NA)”and “left session (S, Cid, NB)” sent to music nodes A and Brespectively.

Similarly, when music node B leaves the session, messages to remove therules in NAAS that allow communication with B are issued, and thebindings interface binding for B is dropped. Finally, music node Aleaves the session by requesting a “session stop (A, Aid, A)”. Thiscauses all resources (e.g., forwarding rules and interface bindings)associated with session S at the NAAS to be released. The server alsodestroys the session object S.

FIGS. 22A-B illustrate the message flows when a mix of NAAS authorizedand non-authorized music nodes are in a session. If all clients in asession are not authorized to use the NAAS service, then they will usethe peer-to-peer message flow described earlier for FIGS. 20A and 20B.If all music nodes are NAAS authorized, the communication setup/teardown flow is described in FIGS. 21A and 21B. When a mixed authorizationof music nodes access to a NAAS exist, it may cause the automaticelevation of the privileges of non-authorized nodes, so that a QoS/SLA(Quality of Service/Service Level Agreement) guarantee to the authorizedmusic node can be met.

Looking back to FIGS. 22A-B, an initial case is shown where music nodesA and B are in a session that does not involve a NAAS. This result maybe because they are not authorized, because the direct path latencybetween them is better than via a NAAS, or because of other sets ofbusiness logic or operational conditions (e.g., NAAS server is down formaintenance). The flow used for A and B to enter the session is asdescribed earlier for FIG. 20A. When music node C attempts to join thesession, the server determines that the NAAS should be used. Music nodeC is directed to perform latency against the NAAS interfaces T1, T2, T3and T4. Ultimately an “assign address (S, C, (ST1, ST2, ST3, ST4))” isexecuted at the NAAS and music node C address is bound to lowest latencyinterface to the NAAS as NC.

The server recognizes that music node C is joining a session involvingmusic nodes A and B that are in a non-NAAS session. As music node C isnow bound to the NAAS, the server directs music nodes A and B to performnetwork test against the NAAS. This results in music node A and B. Themessage sequence shows the flow for music node A first joining C in thesession (FIG. 22A), followed by a similar sequence to music node B (FIG.22B). The message sequence is as described earlier in FIG. 21A for musicnode B and C joining music node A in a NAAS session. FIG. 22B shows thelatter part of the session join sequence.

FIG. 22C shows the leave session sequence, which is similar to the casedescribed in FIG. 21B. The last music node to leave the NAAS sessiondestroys the session.

One further implementation is that the last NAAS authorized music nodeto leave the session causes the session to destroyed and rebuilt asnon-NAAS music session.

Appendix B Further Example Embodiments

This appendix provides further functional block diagram examples for theinteractive music system and related music nodes (MNs) and serversystem(s).

FIG. 23A is a block diagram of an example embodiment 2300 for internodesession managers and data flow for the interactive music systemincluding peer connections and session path transport communications.The MNs 112, 114, and 116 each include a music session manager thatreceives local channel (e.g., music track) information and uses peerconnection information and peer connection block to communicate with theother MNs. These communications can be, for example, implemented usingUDP packets, using TCP/UDP packets communicated through a session bridgeassociated with the server 102, and/or through some other networkcommunication technique. Each MN 112, 114, and 116 also includes asession transport module that communicates with the server and eachother through HTTP/TCP (hyper text transport protocol/transmissioncontrol protocol) packets. The session manager communicates with thesession transport module and uses a channel view composer to displaychannel (e.g., music track) information to the user. The server 102 isconnected to the MNs 112, 114, and 116 as a cloud-based service throughthe network 110.

FIG. 23B is a block diagram of an example embodiment 2350 for a peerconnection block. A peer socket provides a communication interface fornetwork communications with other MNs. A peer connection manager usespeer connection information to determine the communication protocol touse. For example, TCP can be used for communications through the serveras a proxy, and UDP can be used for direct peer-to-peer communications.Input audio and chat data is received from ICPs and is formatted withadditional session information for transport to the other MNs. Receivedaudio packets from the other MNs are parsed and output to the receiveaudio data processor. Encryption of outgoing packets and decryption ofincoming packets can also be used. A latency probe module generatesprobe and response packets for the latency probe operations for the MN.

FIG. 24 is a block diagram of an example embodiment 2400 for music andchat communications from an MN to other MNs within a music session. Eachof the MNs 112, 114, and 116 include a monitor mixer for chat channels,ICPs or a bonding ICP (ICPB), and a playout module. Chat channels andmusic channels are output by each MN. Peer chat channels are processedby the monitor mixer, and peer music channels are processed by theplayout module. For the embodiment depicted, MN 112 is shown ascommunicating its chat microphone channel and its music channels to MNs114 and 116. The uplink bandwidth can be represented by the sum of thechat microphone bandwidth (BW) plus the music channel bandwidth (BW)times the number of peers (e.g., Uplink Bandwith=(Chat Mic BW+MusicChannel BW)*Peers). Fewer music channels help reduce bandwidthrequirements hence the need for ICP bonding (e.g., at the loss ofindividual instrument channel control at the peer receiver). Forexample, if the chat microphone bandwidth is 32 Kb/s, the music channelbandwidth is 64 Kb/s, and a session includes 5 people, each person willneed an uplink bandwidth of (32+64)*4=384 Kb/s.

FIG. 25 is a block diagram of an example embodiment 2500 for a MN systemembodiment including local ICPs (input channel processors) and peer ICPs(input channel processors). Embodiment 2500 is similar to embodiment 820of FIG. 8B with an additional recording point 2501 being shown. It isnoted that other recording points could also be used.

FIG. 26 is a block diagram of an example embodiment 2600 for a peerinput channel processor. Audio packets from peer MNs are received andde-multiplexed by a de-multiplexer (demuxer) 2601. The demuxed audiopackets for a first peer MN are provided to receive processor 2602. Thiscontinues for each peer MN with the demuxed audio packets for an Nthpeer MN being provided to receive processor 2604. Each of the receiveprocessors 2602 . . . 2604 include a deframer (e.g., extracts sessionidentifier, session statistics, etc.), a receive report generator, adecoder, a resampler, and an effects module. Each of the receiveprocessors 2602 . . . 2604 provides a remote channel out for peer MN itis handling and also provides a raw remote audio output for that peerMN, as well.

FIG. 27A is a block diagram of an example embodiment 2700 for a localinput channel processor that captures audio inputs an instrument (e.g.,guitar, keyboard, voice, etc.), voice chat, or another audio input.Instrument or voice input is captured by a capture and formatter blockand then provided to an effects block. Raw captured audio and effectsaudio are both output. A channel throttle arbiter, a stream encoder, anda channel framer are provided for high quality stream processing, mediumquality stream processing, and low quality stream processing of thecaptured audio. A high quality broadcast encoder also receives thecaptured audio, and a channel framer receives the output of the highquality broadcast encoder. High quality, medium quality, and low qualitythrottle control signals associated with the peer MNs (e.g., from 0 to npeer MNs) are received by the channel throttle arbiters, respectively.The ICP outputs high quality audio frames, medium quality audio frames,and low quality audio frames to the peer MNs based upon these controlsignals. Broadcast frames are also output by the ICP. Other inputs andoutputs are also provided.

FIG. 27B is a block diagram of an example embodiment 2750 for a localinput channel processor that captures audio inputs and bonds themtogether for a group of instruments. Multiple instrument or voice inputsare captured by capture blocks and the captured audio inputs are mixedtogether by a music mixer to generate a group audio output. The outputof the mixer is received by an encoder, and the encoded audio isprovided to a channel framer. The channel framer outputs the group mediapackets to the peer MNs (e.g., from 0 to n peer MNs). A channel throttlereceives controls from the peer MNs and provides controls to the musicencoder. Other inputs and outputs are also provided.

FIG. 27C is a block diagram of an example embodiment 2770 for a localinput channel processor that captures audio inputs for a group ofinstruments and bonds these inputs together using a group mixer (e.g.,input channel processor bonding). Embodiment 2770 captures multipleinputs and bonds them with the group mixer as provided by embodiment2750 in FIG. 27B and also provides raw outputs and effects outputs asprovided by embodiment 2700 of FIG. 2A. Embodiment 2770 also providesthe high quality, medium quality, low quality, and broadcast levelprocessing of embodiment 2700 of FIG. 2A.

FIGS. 28A-B are block diagrams of example embodiments for mixerarchitectures that can be utilized. Embodiment 2800 of FIG. 28A includes1 to N audio channel capture blocks that provide captured audio to amixer at 48 kHz sample rate. Embodiment 2800 also includes 1 to N audiochannel playout blocks that receive outputs from the mixer. A decoderand an encoder operating at 48 kHz are also provided. Resamplers arealso used as needed to resample the captured audio or the output audio.A recorder also receives mixed audio from the mixer and makesrecordings. Embodiment 2850 of FIG. 28B is similar to embodiment 2800except that a 48 kHz or a 44.1 kHz sample rate is used. Optionalresamplers are again provided if needed to resample the captured audioor output audio. Also, resamplers can be used with respect to thedecoder and encoder if operating at a different sample rate than themixer.

FIG. 29 is a block diagram of an example embodiment 3000 for virtualdevice bridge software that includes an application space having clientmodule and DAW (digital audio workstation) module and a kernel havingvirtual audio inputs and outputs. The application client in aapplication space for a software stack communicates with a virtual audioinput device in the kernel. A DAW within the application space receivesan output from the virtual audio input device and provides audio outputsto a virtual output audio device in the kernel. The virtual output audiodevice provides audio outputs to the client application. The clientapplication also communicates audio packets with the network or cloud.

FIGS. 30A-B are block diagrams of example embodiments for DAW data flow.Embodiment 3000 of FIG. 30A is similar to embodiment 1110 of FIG. 11Awhere the MN includes a live quality encoder and operates as a livebroadcaster. Embodiment 3050 of FIG. 30B is also similar to embodiment1110 of FIG. 11A where the MN can operate as live broadcaster but alsoincludes a recorder and an uploader to send the live broadcast to aserver system where the server provides a broadcast service.

Appendix C Example API Descriptions and Details Example API Descriptions

Here are the calls that the Client may make to the Server:

-   -   {Ok, Sid, AddrPort[ ]} startSession(Uid uid, AddrPort addr,        AddrPortScore[ ] scores)    -   {Ok, AddrPort[ ]} joinSession(Sid sid, Uid uid, AddrPort addr,        AddrPortScore[ ] scores)    -   Ok leaveSession(Sid sid, Uid uid, AddrPort addr)    -   Ok stopSession(Sid sid, Uid uid, AddrPort addr)        Here are the calls that the Server may make to the Client:    -   Ok joinedSession(Sid sid, Uid uid, AddrPort addr)    -   Ok leftSession(Sid sid, Uid uid, AddrPort addr)    -   AddrPortScore[ ] test(AddrPort[ ] addrs)        Here are the calls that the Server may make to NAAS:    -   AddrPort[ ] setupTest(AddrPort client)    -   Ok cancelTest(AddrPort client)    -   AddrPort assignAddress(Sid sid, AddrPort client, AddrPortScore[        ] scores)    -   Ok releaseAddress(Sid sid, AddrPort client, AddrPort assigned)    -   Ok addRule(Sid sid, AddrPort client1, AddrPort assigned1,        AddrPort client2, AddrPort assigned2)    -   Ok dropRule(Sid sid, AddrPort client1, AddrPort assigned1,        AddrPort client2, AddrPort assigned2)        Here are the calls that the NAAS may make to the Server:

None.

Clients may not contact NAAS directly and vice versa.

Example API Details

-   {Ok, Sid, AddrPort[ ]} startSession(Uid uid, AddrPort addr,    AddrPortScore[ ] scores)    -   The client requests a new session be created. Uid is the unique        id of the user making the request, and addr is the publicly        visible address and port number of the client's UDP socket.        Scores is initially passed as null.    -   The user generally won't know their own publicly visible address        or uid, but the user does know the port number of their socket.        This is all the user need supply. The web server, upon receiving        the request, fills in the uid and the publicly visible address        before acting on the request.    -   If this user is enabled to use NAAS and if NAAS is available,        the initial request with null scores will be failed with Ok        indicating “Test”, sid returned as null, and an array of        AddrPort to test. The client will test each AddrPort in the        prescribed manner and resubmit the startSession request with the        resulting scores.    -   Status is returned in Ok as well as the newly minted Sid if the        request succeeded. The sid is used to manipulate the session        including inviting others to join. If NAAS is not enabled for        this user, NAAS is not available, or if scores are submitted,        the returned AddrPort array will be null.    -   Note that testing is required of every client who joins a        session which includes NAAS. This includes clients in sessions        which did not acquire NAAS capability until a NAAS enabled user        joined. See Server to Client call test.-   {Ok, AddrPort[ ]} joinSession(Sid sid, Uid uid, AddrPort addr,    AddrPortScore[ ] scores)    -   The client requests to join an existing session. Sid is the        unique id of the session, uid is the unique id of the user        making the request, and addr is the publicly visible address and        port number of the client's UDP socket. Scores is initially        passed as null.    -   The user generally won't know their own publicly visible address        or uid, but the user does know the port number of their socket.        This is all the user need supply. The web server, upon receiving        the request, fills in the uid and the publicly visible address        before acting on the request.    -   If this user is enabled to use NAAS and if NAAS is available,        the initial request with null scores will be failed with Ok        indicating “Test” and an array of AddrPort to test. The client        will test each AddrPort in the prescribed manner and resubmit        the joinSession request with the resulting scores.    -   Status is returned in Ok. If NAAS is not enabled for this user,        NAAS is not available, or if scores are submitted, the returned        AddrPort array will be null.    -   Note that testing is required of every client who joins a        session which includes NAAS. This includes clients in sessions        which did not acquire NAAS capability until a NAAS enabled user        joined. See Server to Client call test.-   Ok leaveSession(Sid sid, Uid uid, AddrPort addr)    -   The client requests to be removed from the session. Sid is the        unique id of the session, uid is the unique id of the user        making the request, and addr is the publicly visible address and        port number of the client's UDP socket.    -   The user generally won't know their own publicly visible address        or uid, but the user does know the port number of their socket.        This is all the user need supply. The web server, upon receiving        the request, fills in the uid and the publicly visible address        before acting on the request.    -   If NAAS resources are allocated to this user, the resources are        freed (cancelTest, dropRule, releaseAddress).    -   If this is the last participant in the session, the session is        also removed (stopSession). If other participants remain in the        session, they are informed that this user has left        (leftSession).    -   Status is returned in Ok.-   Ok stopSession(Sid sid, Uid uid, AddrPort addr)    -   The client requests that the session be destroyed. Sid is the        unique id of the session, uid is the unique id of the user        making the request, and addr is the publicly visible address and        port number of the client's UDP socket.    -   The user generally won't know their own publicly visible address        or uid, but the user does know the port number of their socket.        This is all the user need supply. The web server, upon receiving        the request, fills in the uid and the publicly visible address        before acting on the request.    -   The session is marked for destruction (nobody may join).    -   Remaining users are notified that the other users have left the        session (leftSession).    -   If NAAS resources are allocated to this session, the resources        are freed (cancelTest, dropRule, releaseAddress).    -   The session is removed.    -   Status is returned in Ok.-   Ok joinedSession(Sid sid, Uid uid, AddrPort addr)    -   The server notifies that the specified user has joined the        session. Sid is the unique id of the session, uid is the unique        id of the user that joined, and addr is the publicly visible        address and port number of the client's UDP socket (or the        assigned NAAS address of the receiving user if NAAS is        involved).    -   The receiving client should begin sending to the specified        address/port if it isn't already.    -   If the uid had previously “joined” with a different address, the        new address replaces the old and operation continues.    -   Status is returned in Ok.-   Ok leftSession(Sid sid, Uid uid, AddrPort addr)    -   The server notifies that the specified user has left the        session. Sid is the unique id of the session, uid is the unique        id of the user that left, and addr is the publicly visible        address and port number of the client's UDP socket (or the        assigned NAAS address of the receiving user if NAAS is        involved).    -   The receiving client should stop sending to the specified        address/port unless any other participants also have that same        address (eh, if NAAS is involved).    -   Status is returned in Ok.-   AddrPortScore[ ] test(AddrPort[ ] addrs)    -   The server notifies the client that a test of addresses is        required to determine which address is the best for this client.        This test is required when NAAS has become involved in the        session. The user should execute a ping test on each address and        return the scores to the server. See startSession and        joinSession for implicit test operations using this same        technique.    -   A UDP packet sent to the specified address will be returned        (echoed) as it was received. The client should construct a        packet of some moderate size (135 bytes will do) with an        embedded high precision timestamp and sequence number, then send        it to the address and receive the response. Enough packets        should be sent to ensure a good sample. The first packet (sent        and received) often takes substantially longer than the rest,        and so should be excluded from the stats.    -   Min/max/average of the rest should be returned in the scoring        structure, in millisecond units, as well as the count        sent/received. The client should send a packet, wait up to 50 ms        for the response, and send the next one as soon as the response        is received or deemed missing, perhaps sending a total of 10-20        packets. Late packets should be ignored if they finally arrive        (by using the sequence number). Stats should be calculated        starting with the second packet received, and only include        received packets.-   AddrPort[ ] setupTest(AddrPort client)    -   The server requests that NAAS setup a test environment for the        specified client address and return all appropriate addresses        for the test.    -   If NAAS fails somehow to setup the test, null is returned.-   Ok cancelTest(AddrPort client)    -   The server requests that NAAS remove a previously setup test        environment for the specified client.    -   Status is returned in Ok.-   AddrPort assignAddress(Sid sid, AddrPort client, AddrPortScore[ ]    scores)    -   The server requests that NAAS use the scores to assign an        address appropriate for the specified client address. Sid is the        unique id of the session, and client is the publicly visible        address and port number of the client's UDP socket.    -   Any previously test setup is cancelled.    -   The assigned address is returned, or if there was a problem        assigning an address, null is returned.-   Ok releaseAddress(Sid sid, AddrPort client, AddrPort assigned)    -   The server requests that NAAS remove any previous assigned        address. Sid is the unique id of the session, client is the        publicly visible address and port number of the client's UDP        socket, and assigned is the previously assigned address.    -   Any rules involving the client and assigned addresses will be        dropped (see addRule, dropRule).    -   Status is returned in Ok.-   Ok addRule(Sid sid, AddrPort client1, AddrPort assigned1, AddrPort    client2, AddrPort assigned2)    -   The server requests that NAAS add a rule mapping one client to        another. Sid is the unique id of the session, client1 is the        public address of the first client, assigned1 is the assigned        address of the first client (per assignAddress), client2 is the        public address of the second client, and assigned2 is the        corresponding assigned address.    -   Any packet arriving at NAAS from client1 to assigned1 will be        sent from assigned2 to client2, and vice versa.    -   Assigned1 and assigned2 must be addresses assigned and not yet        released by this NAAS instance. Status is returned in Ok.-   Ok dropRule(Sid sid, AddrPort client1, AddrPort assigned1, AddrPort    client2, AddrPort assigned2)    -   The server requests that NAAS drop a rule mapping one client to        another. Sid is the unique id of the session, client1 is the        public address of the first client, assigned1 is the assigned        address of the first client (per assignAddress), client2 is the        public address of the second client, and assigned2 is the        corresponding assigned address.    -   Any packet arriving at NAAS from client1 to assigned1 will no        longer be sent from assigned2 to client2, and vice versa.    -   Assigned1 and assigned2 must be addresses assigned and not yet        released by this NAAS instance.    -   Status is returned in Ok.

Further modifications and alternative embodiments of the embodimentsdescribed herein will be apparent to those skilled in the art in view ofthis description. It will be recognized, therefore, that the inventionsdescribed herein are not limited by these example arrangements.Accordingly, this description is to be construed as illustrative only,and it is to be understood that the embodiments shown and describedherein are to be taken as example embodiments. Various changes may bemade in the implementations and architectures and different embodimentscan be implemented. For example, equivalent elements may be substitutedfor those illustrated and described herein, and features can be utilizedindependently of other features, all as would be apparent to one skilledin the art after having the benefit of this description.

What is claimed is:
 1. An interactive music server system, comprising: anetwork interface; one or more processing devices configured tocommunicate network packets through the network interface withinteractive music client systems associated with one or more interactivemusic sessions; wherein the one or more processing devices are furtherconfigured to communicate with the interactive music client systems todetermine operational parameters associated with the interactive musicclient systems.
 2. The interactive music server system of claim 1,wherein the one or more processing devices are further configured toinstruct the interactive music client systems to perform latency testswith other interactive music client systems to generate latency scoreresults as the operational parameters, and wherein the one or moreprocessing devices are further configured to receive the latency scoreresults from the interactive music client systems and to store thelatency score results in one or more data storage systems.
 3. Theinteractive music server system of claim 2, wherein the one or moreprocessing devices are further configured to use the stored latencyscore results for one or more interactive music client systems topredict latency score results for one or more different interactivemusic client systems.
 4. The interactive music server system of claim 3,wherein the one or more processing devices are further configured topredict latency score results based upon a shared internet serviceprovider and a shared local geographic area between interactive musicclient systems.
 5. The interactive music server system of claim 4,wherein the shared local geographic area comprises at least one of ashared city or a shared zip code.
 6. The interactive music server systemof claim 2, wherein the latency score results for the latency testsinclude one or more of the following: network latency including time foran audio packet to travel within a network between interactive musicclient systems, a transmit latency including time for an interactivemusic client system to generate a transmit audio packet, a receivelatency including time for an interactive music client system to processa receive audio packet, or a latency associated with communicationsthrough a proxy server.
 7. The interactive music server system of claim2, wherein the one or more processing devices are further configured tolimit latency tests based upon one or more filters.
 8. The interactivemusic server system of claim 7, wherein the one or more filters compriseat least one of the following: a distance filter associated withgeographic distance between interactive music client systems, afrequency filter associated with a rate of latency tests, or existenceof stored latency data suitable for predictive purposes between a pairof interactive music clients.
 9. The interactive music server system ofclaim 2, wherein the one or more processing devices are furtherconfigured to receive an access request from an interactive music clientsystem to join an interactive music session and to use latency scoreresults associated with the requesting interactive music client systemto allow or disallow the access request.
 10. The interactive musicserver system of claim 9, wherein the one or more processing devices arefurther configured to allow an interactive music client systemassociated with the interactive music session to control approval ordisapproval of the access request based upon latency score results. 11.The interactive music server system of claim 10, wherein the interactivemusic session is a currently active session or a future scheduledsession.
 12. The interactive music server system of claim 2, wherein theone or more processing devices are further configured to allow aninteractive music client system to search, filter, or order a displayedlist of other interactive music client systems based upon latency scoreresults.
 13. The interactive music server system of claim 1, wherein theone or more processing devices are further configured to communicatewith the interactive music client systems to determine internalcapabilities information for the interactive music client systems as theoperational parameters, and wherein the one or more processing devicesare further configured to receive the internal capabilities informationfrom the interactive music client systems and to store the internalcapabilities information in one or more data storage systems.
 14. Theinteractive music server system of claim 13, wherein the internalcapabilities information comprises one or more of the following:concurrent decode capabilities, packet processing rate capabilities,audio processing capabilities, video processing capabilities, or networkbandwidth capabilities.
 15. The interactive music server system of claim13, wherein the one or more processing devices are further configured toreceive an access request from an interactive music client system tojoin an interactive music session and to use internal capabilitiesinformation associated with the requesting interactive music clientsystem to allow or disallow the access request.
 16. The interactivemusic server system of claim 13, wherein the one or more processingdevices are further configured to instruct interactive music clientsystems within an interactive music session to apply packet ratethrottling based upon internal capabilities information for at least oneof the interactive music client systems within the interactive musicsession.
 17. A method to determine operational parameters for aninteractive music system, comprising: communicating network packets withinteractive music client systems associated with one or more interactivemusic sessions; and communicate with the interactive music clientsystems to determine operational parameters associated with theinteractive music client systems.
 18. The method of claim 17, furthercomprising instructing the interactive music client systems to performlatency tests with other interactive music client systems to generatelatency score results as the operational parameters, receiving thelatency score results from the interactive music client systems, andstoring the latency score results in one or more data storage systems.19. The method of claim 18, further comprising using the stored latencyscore results for one or more interactive music client systems topredict latency score results for one or more different interactivemusic client systems.
 20. The method of claim 19, further comprisingpredicting latency score results based upon a shared internet serviceprovider and a shared local geographic area between interactive musicclient systems.
 21. The method of claim 20, wherein the shared localgeographic area comprises at least one of a shared city or a shared zipcode.
 22. The method of claim 18, wherein the latency score results forthe latency tests include one or more of the following: network latencyincluding time for an audio packet to travel within a network betweeninteractive music client systems, a transmit latency including time foran interactive music client system to generate a transmit audio packet,a receive latency including time for an interactive music client systemto process a receive audio packet, or a latency associated withcommunications through a proxy server.
 23. The method of claim 18,further comprising limiting latency tests based upon one or morefilters.
 24. The method of claim 23, wherein the one or more filterscomprise at least one of the following: a distance filter associatedwith geographic distance between interactive music client systems, afrequency filter associated with a rate of latency tests, or existenceof stored latency data suitable for predictive purposes between a pairof interactive music clients.
 25. The method of claim 18, furthercomprising receiving an access request from an interactive music clientsystem to join an interactive music session and using latency scoreresults associated with the requesting interactive music client systemto allow or disallow the access request.
 26. The method of claim 25,further comprising allowing an interactive music client systemassociated with the interactive music session to control approval ordisapproval of the access request based upon latency score results. 27.The method of claim 26, wherein the interactive music session is acurrently active session or a future scheduled session.
 28. The methodof claim 19, further comprising allowing an interactive music clientsystem to search, filter, or order a displayed list of other interactivemusic client systems based upon latency score results.
 29. The method ofclaim 17, further comprising communicating with the interactive musicclient systems to determine internal capabilities information for theinteractive music client systems as the operational parameters,receiving the internal capabilities information from the interactivemusic client systems, and storing the internal capabilities informationin one or more data storage systems.
 30. The method of claim 29, whereinthe internal capabilities information comprises one or more of thefollowing: concurrent decode capabilities, packet processing ratecapabilities, audio processing capabilities, video processingcapabilities, or network bandwidth capabilities.
 31. The method of claim29, further comprising receiving an access request from an interactivemusic client system to join an interactive music session and usinginternal capabilities information associated with the requestinginteractive music client system to allow or disallow the access request.32. The method of claim 29, further comprising instructing interactivemusic client systems within an interactive music session to apply packetrate throttling based upon internal capabilities information for atleast one of the interactive music client systems within the interactivemusic session.